Augmented reality device for presenting virtual imagery registered to a viewed surface

ABSTRACT

An augmented reality device for inserting virtual imagery into a user&#39;s view of their physical environment. The device comprises: a see-through display device including a wavefront modulator; a camera for imaging a surface in the physical environment; and a controller. The controller is configured for capturing an image of the surface; determining the virtual imagery to be displayed at a predetermined position relative to the surface; determining a position of the surface relative to the augmented reality device; generating an image based on the virtual imagery and on the position of the surface relative to the augmented reality device; and displaying the generated image via the display device. Based on pixel depth information, the controller modulates the wavefront curvature of light emitted for each pixel so that the user sees the virtual imagery at the predetermined position relative to the surface regardless of changes in position of the user&#39;s eyes with respect to the display device.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.11/193,481 filed Aug. 1, 2005 all of which is herein incorporated byreference.

FIELD OF THE INVENTION

The present invention relates to the fields of interactive paper,printing systems, computer publishing, computer applications,human-computer interfaces, information appliances, augmented reality,and head-mounted displays.

CO-PENDING REFERENCES

Ser. Nos. 11/193,481 11/193,482 11/193,479

CROSS-REFERENCES

6,750,901 6,476,863 6,788,336 7,249,108 6,566,858 6,331,946 6,246,9706,442,525 7,346,586 7,685,423 6,374,354 7,246,098 6,816,968 6,757,8326,334,190 6,745,331 7,249,109 7,197,642 7,093,139 7,509,292 7,685,4247,743,262 7,210,038 7,401,223 7,702,926 7,716,098 7,364,256 7,258,4177,293,853 7,328,968 7,270,395 7,461,916 7,510,264 7,334,864 7,255,4197,284,819 7,229,148 7,258,416 7,273,263 7,270,393 6,984,017 7,347,5267,357,477 7,465,015 7,364,255 7,357,476 7,758,148 7,284,820 7,341,3287,246,875 7,322,669 7,243,835 10/815,630 7,703,693 10/815,638 7,251,05010/815,642 7,097,094 7,137,549 10/815,618 7,156,292 10/815,635 7,357,3237,654,454 7,137,566 7,131,596 7,128,265 7,207,485 7,197,374 7,175,08910/815,617 7,537,160 7,178,719 7,506,808 7,207,483 7,296,737 7,270,2667,605,940 7,128,270 7,784,681 7,677,445 7,506,168 7,441,712 7,663,78911/041,609 11/041,626 7,537,157 7,801,742 7,395,963 7,457,961 7,739,5097,467,300 7,467,299 7,565,542 7,457,007 7,150,398 7,159,777 7,450,2737,188,769 7,097,106 7,070,110 7,243,849 6,623,101 6,406,129 6,505,9166,457,809 6,550,895 6,457,812 7,152,962 6,428,133 7,204,941 7,282,1647,465,342 7,278,727 7,417,141 7,452,989 7,367,665 7,138,391 7,153,9567,423,145 7,456,277 7,550,585 7,122,076 7,148,345 7,470,315 7,572,3277,416,280 7,252,366 7,488,051 7,360,865 6,746,105 7,156,508 7,159,9727,083,271 7,165,834 7,080,894 7,201,469 7,090,336 7,156,489 7,413,2837,438,385 7,083,257 7,258,422 7,255,423 7,219,980 7,591,533 7,416,2747,367,649 7,118,192 7,618,121 7,322,672 7,077,505 7,198,354 7,077,5047,614,724 7,198,355 7,401,894 7,322,676 7,152,959 7,213,906 7,178,9017,222,938 7,108,353 7,104,629 7,246,886 7,128,400 7,108,355 6,991,3227,287,836 7,118,197 7,575,298 7,364,269 7,077,493 6,962,402 7,686,4297,147,308 7,524,034 7,118,198 7,168,790 7,172,270 7,229,155 6,830,3187,195,342 7,175,261 7,465,035 7,108,356 7,118,202 7,510,269 7,134,7447,510,270 7,134,743 7,182,439 7,210,768 7,465,036 7,134,745 7,156,4847,118,201 7,111,926 7,431,433 7,018,021 7,401,901 7,468,139 7,448,7297,246,876 7,431,431 7,419,249 7,377,623 7,334,876 7,249,901 7,477,9877,156,289 7,178,718 7,225,979 7,540,429 7,584,402 11/084,806 7,721,9487,079,712 6,825,945 7,330,974 6,813,039 7,190,474 6,987,506 6,824,0447,038,797 6,980,318 6,816,274 7,102,772 7,350,236 6,681,045 6,678,4996,679,420 6,963,845 6,976,220 6,728,000 7,110,126 7,173,722 6,976,0356,813,558 6,766,942 6,965,454 6,995,859 7,088,459 6,720,985 7,286,1136,922,779 6,978,019 6,847,883 7,131,058 7,295,839 7,406,445 7,533,0316,959,298 6,973,450 7,150,404 6,965,882 7,233,924 7,707,082 7,593,8997,175,079 7,162,259 6,718,061 7,464,880 7,012,710 6,825,956 7,451,1157,222,098 7,590,561 7,263,508 7,031,010 6,972,864 6,862,105 7,009,7386,989,911 6,982,807 7,518,756 6,829,387 6,714,678 6,644,545 6,609,6536,651,879 10/291,555 7,293,240 7,467,185 7,415,668 7,044,363 7,004,3906,867,880 7,034,953 6,987,581 7,216,224 7,506,153 7,162,269 7,162,2227,290,210 7,293,233 7,293,234 6,850,931 6,865,570 6,847,961 10/685,5837,162,442 10/685,584 7,159,784 7,557,944 7,404,144 6,889,896 7,174,0566,996,274 7,162,088 7,388,985 7,417,759 7,362,463 7,259,884 7,167,2707,388,685 6,986,459 10/954,170 7,181,448 7,590,622 7,657,510 7,324,9897,231,293 7,174,329 7,369,261 7,295,922 7,200,591 7,693,828 11/020,26011/020,321 11/020,319 7,466,436 7,347,357 11/051,032 7,382,482 7,602,5157,446,893 11/082,815 7,389,423 7,401,227 6,991,153 6,991,154 7,589,8547,551,305 7,322,524 7,068,382 7,007,851 6,957,921 6,457,883 7,044,3817,094,910 7,091,344 7,122,685 7,038,066 7,099,019 7,062,651 6,789,1946,789,191 7,529,936 7,278,018 7,360,089 7,526,647 7,467,416 6,644,6426,502,614 6,622,999 6,669,385 6,827,116 7,011,128 7,416,009 6,549,9356,987,573 6,727,996 6,591,884 6,439,706 6,760,119 7,295,332 7,064,8516,826,547 6,290,349 6,428,155 6,785,016 6,831,682 6,741,871 6,927,8716,980,306 6,965,439 6,840,606 7,036,918 6,977,746 6,970,264 7,068,3897,093,991 7,190,491 7,511,847 7,663,780 10/962,412 7,177,054 7,364,28210/965,733 10/965,933 7,728,872 7,468,809 7,180,609 7,538,793 7,466,4387,292,363 7,515,292 6,982,798 6,870,966 6,822,639 6,474,888 6,627,8706,724,374 6,788,982 7,263,270 6,788,293 6,946,672 6,737,591 7,091,9607,369,265 6,792,165 7,105,753 6,795,593 6,980,704 6,768,821 7,132,6127,041,916 6,797,895 7,015,901 7,289,882 7,148,644 10/778,056 10/778,0587,515,186 7,567,279 10/778,062 7,096,199 7,286,887 7,400,937 7,474,9307,324,859 7,218,978 7,245,294 7,277,085 7,187,370 7,609,410 7,660,49010/919,379 7,019,319 7,593,604 7,660,489 7,043,096 7,148,499 7,463,2507,590,311 11/155,557 7,055,739 7,233,320 6,830,196 6,832,717 7,182,2477,120,853 7,082,562 6,843,420 7,793,852 6,789,731 7,057,608 6,766,9446,766,945 7,289,103 7,412,651 7,299,969 7,264,173 7,549,595 7,111,7917,077,333 6,983,878 7,564,605 7,134,598 7,431,219 6,929,186 6,994,2647,017,826 7,014,123 7,134,601 7,150,396 7,469,830 7,017,823 7,025,2767,284,701 7,080,780 7,376,884 10/492,169 7,469,062 7,359,551 7,444,0217,308,148 7,630,962 10/531,229 7,630,553 7,630,554 10/510,391 7,660,4667,526,128 6,957,768 7,456,820 7,170,499 7,106,888 7,123,239 6,982,7016,982,703 7,227,527 6,786,397 6,947,027 6,975,299 7,139,431 7,048,1787,118,025 6,839,053 7,015,900 7,010,147 7,133,557 6,914,593 7,437,6716,938,826 7,278,566 7,123,245 6,992,662 7,190,346 7,417,629 7,468,7247,715,035 7,221,781 11/102,843 6,593,166 7,132,679 6,940,088 7,119,35710/727,162 7,377,608 7,399,043 7,121,639 7,165,824 7,152,942 10/727,1577,181,572 7,096,137 7,302,592 7,278,034 7,188,282 7,592,829 10/727,17910/727,192 7,770,008 7,707,621 7,523,111 7,573,301 7,660,998 7,783,88610/754,938 10/727,160 7,369,270 6,795,215 7,070,098 7,154,638 6,805,4196,859,289 6,977,751 6,398,332 6,394,573 6,622,923 6,747,760 6,921,1447,092,112 7,192,106 7,457,001 7,173,739 6,986,560 7,008,033 7,551,3247,195,328 7,182,422 7,374,266 7,427,117 7,448,707 7,281,330 10/854,5037,328,956 7,735,944 7,188,928 7,093,989 7,377,609 7,600,843 10/854,4987,390,071 10/854,526 7,549,715 7,252,353 7,607,757 7,267,417 10/854,5057,517,036 7,275,805 7,314,261 7,281,777 7,290,852 7,484,831 7,758,14310/854,527 7,549,718 10/854,520 7,631,190 7,557,941 7,757,086 10/854,5017,266,661 7,243,193 10/854,518 7,448,734 7,425,050 7,364,263 7,201,4687,360,868 7,234,802 7,303,255 7,287,846 7,156,511 10/760,264 7,258,4327,097,291 7,645,025 10/760,248 7,083,273 7,367,647 7,374,355 7,441,8807,547,092 10/760,206 7,513,598 10/760,270 7,198,352 7,364,264 7,303,2517,201,470 7,121,655 7,293,861 7,232,208 7,328,985 7,344,232 7,083,2727,621,620 7,669,961 7,331,663 7,360,861 7,328,973 7,427,121 7,407,2627,303,252 7,249,822 7,537,309 7,311,382 7,360,860 7,364,257 7,390,0757,350,896 7,429,096 7,384,135 7,331,660 7,416,287 7,488,052 7,322,6847,322,685 7,311,381 7,270,405 7,303,268 7,470,007 7,399,072 7,393,0767,681,967 7,588,301 7,249,833 7,524,016 7,490,927 7,331,661 7,524,0437,300,140 7,357,492 7,357,493 7,566,106 7,380,902 7,284,816 7,284,8457,255,430 7,390,080 7,328,984 7,350,913 7,322,671 7,380,910 7,431,4247,470,006 7,585,054 7,347,534 7,441,865 7,469,989 7,367,650 6,454,4826,808,330 6,527,365 6,474,773 6,550,997 7,093,923 6,957,923 7,131,7247,396,177 7,168,867 7,125,098

BACKGROUND OF THE INVENTION

Virtual reality completely occludes a person's view of their physicalreality (usually with goggles or a helmet) and substitutes anartificial, or virtual view projected on to the inside of an opaquevisor. Augmented reality changes a user's view of the physicalenvironment by adding virtual imagery to the user's field of view (FOV).

Augmented reality typically relies on either a see-through Head MountedDisplay (HMD) or a video-based HMD. A video-based HMD captures video ofthe user's field of view, augments it with virtual imagery, andredisplays it for the user's eyes to see. A see-through HMD, asdiscussed above, optically combines virtual imagery with the user'sactual field of view. A video-based HMD has the advantage thatregistration between the real world and the virtual imagery isrelatively easy to achieve, since parallax due to eye position relativeto the HMD does not occur. It has the disadvantage that it is typicallybulky and has a narrow field of view, and typically provides poor depthcues (i.e. a sense of depth or the distance from the eye to an object).

A see-through HMD has the advantage that it can be relatively less bulkywith a wider field of view, and can provide good depth cues. It has thedisadvantage that registration between the real world and the virtualimagery is difficult to achieve without intrusive calibration proceduresand sophisticated eye tracking

Registration between the real world and the virtual imagery can beprovided by inertial sensors to track head movement, or by trackingfiducial markers positioned in the physical environment. The HMD usesthe fiducials as reference points for the virtual imagery. A HMD oftenrelies on inertial tracking to maintain registration during headmovement, but this is a somewhat inaccurate approach.

The use of fiducials in the real world is less popular because fiducialtracking is usually not fast enough for typical user head movements,fiducials are typically sparsely placed making fiducial detectioncomplex, and the fiducial encoding capacity is typically small whichlimits the number of individual fiducials that can uniquely identifythemselves. This can lead to fiducial ambiguity in large installations.

SUMMARY OF THE INVENTION

According to a first aspect, the present invention provides an augmentedreality device for inserting virtual imagery into a user's view of theirphysical environment, the device comprising:

a display device through which the user can view the physicalenvironment;

an optical sensing device for sensing at least one surface in thephysical environment; and,

a controller for projecting the virtual imagery via the display device;wherein during use,

the controller uses wave front modulation to match the curvature of thewave fronts of light reflected from the display device to the user'seyes with the curvature of the wave fronts of light that would betransmitted through the device display if the virtual imagery weresituated at a predetermined position relative to the surface, such thatthe user sees the virtual imagery at the predetermined positionregardless of changes in position of the user's eyes with respect to thesee-through display.

The human visual system's ability to locate a point in space isdetermined by the center and radius of curvature of the wavefrontsemitted by the point as they impinge on the eyes. A three dimensionalobject can be thought of as an infinite number of point sources inspace.

The present invention puts each pixel of the virtual image projected bythe display device at a predetermined point relative to the sensedsurface with a wavefront display that adjusts the curvature of the wavesto correspond to the position of the point. This keeps the virtual imagein registration with the user's field of view without first establishing(and maintaining) registration between the eye and the see-throughdisplay.

Optionally, the display device has a see-through display for one of theuser's eyes. Alternatively, the display device has two see-throughdisplays, one for each of the user's eyes respectively.

Optionally, the surface has a pattern of coded data disposed on it, suchthat the controller uses information from the coded data to identify thevirtual imagery to be displayed.

Optionally, the display device, the optical sensing device and thecontroller are adapted to be worn on the user's head.

Optionally, the optical sensing device is a camera-based and during use,provides identity and position data related to the coded surface to thecontroller for determining the virtual imagery displayed.

Optionally, display device has a virtual retinal display (VRD) for eachof the user's eyes, each of the VRD's scans at least one beam of lightinto a raster pattern and modulates the or each beam to produce spatialvariations in the virtual imagery. Optionally, the VRD scans red, greenand blue beams of light to produce color pixels in the raster pattern.

Optionally, the VRD's present a slightly different image to each of theuser's eyes, the slight differences being based on eye separation, andthe distance to the predetermined position of the virtual imagery tocreate a perception of depth via stereopsis.

Optionally, the wavefront modulator uses a deformable membrane mirror,liquid crystal phase corrector, a variable focus liquid lens or avariable focus liquid mirror.

Optionally, the wave front modulator uses a deformable membrane mirror,liquid crystal phase corrector, a variable focus liquid lens or avariable focus liquid mirror.

Optionally, the virtual imagery is a movie, a computer applicationinterface, computer application output, hand drawn strokes, text, imagesor graphics.

Optionally, the display device has pupil trackers to detect anapproximate point of fixation of the user's gaze such that a virtualcursor can be projected into the virtual imagery and navigated usinggaze direction.

Additional Aspects

Related aspects of the invention are set out below together with the adiscussion of their backgrounds to provide suitable context for thebroad descriptions of these aspects.

Head Mounted Display with Coded Surface Sensor BACKGROUND

As discussed above, the use of fiducials in the real world is lesspopular because fiducial tracking is usually not fast enough for typicaluser head movements, fiducials are typically sparsely placed makingfiducial detection complex, and the fiducial encoding capacity istypically small which limits the number of individual fiducials that canuniquely identify themselves. This can lead to fiducial ambiguity inlarge installations.

SUMMARY

Accordingly, this aspect provides an augmented reality device for a userin a physical environment with a coded surface, the device comprising:

a display device through which the user can view the physicalenvironment;

an optical sensing device for sensing the coded surface; and,

a controller for determining an identity, position and orientation ofthe coded surface; wherein,

the controller projects virtual imagery via the display device such thatthe virtual imagery is viewed by the user in a predetermined positionwith respect to the coded surface.

By providing a coded surface instead of sparse fiducials, the inventionavoids tracking and ambiguity problems. The relatively dense codingallows the surface to be accurately positioned and oriented to maintainregistration with the virtual imagery.

Optionally, the display device has a see-through display for one of theuser's eyes. Alternatively, the display device has two see-throughdisplays, one for each of the user's eyes respectively.

Optionally, the augmented reality device further comprises a hand-heldsensor for sensing and decoding information from the coded surface.

Optionally, the coded surface has first and second coded data disposedon it in first and second two dimensional patterns respectively, thefirst pattern having a scale sized such that the optical sensing devicecan capture images with a resolution suitable for the display device todecode the first coded data, and the second pattern having a scale sizedsuch that the hand-held sensor can capture images with a resolutionsuitable for it to decode the second coded data.

Optionally, the hand-held sensor is an electronic stylus with a writingnib wherein during use, the stylus captures images of the second patternwhen the nib is in contact with, or proximate to, the coded surface.

Optionally, the display device, the optical sensing device and thecontroller are adapted to be worn on the user's head.

Optionally, the optical sensing device is camera-based and during use,provides identity and position data related to the coded surface to thecontroller for determining the virtual imagery displayed.

Optionally, the display device has a virtual retinal display (VRD) foreach of the user's eyes, each of the VRD's scans at least one beam oflight into a raster pattern and modulates the or each beam to producespatial variations in the virtual imagery. Optionally, the VRD scansred, green and blue beams of light to produce color pixels in the rasterpattern.

Optionally, each of the virtual retinal displays have a wavefrontmodulator to match the curvature of the wavefronts of light reflectedfrom the see-through display to the user's eyes with the curvature ofthe wave fronts of light that would be transmitted through thesee-through display for that eye if the virtual imagery were actualimagery at a predetermined position relative to the coded surface, suchthat the user views the virtual imagery at the predetermined positionregardless of changes in position of the user's eyes with respect to thesee-through display.

Optionally, each of the virtual retinal displays present a slightlydifferent image to each of the user's eyes, the slight differences beingbased on eye separation, and the distance to the predetermined positionof the virtual imagery to create a perception of depth via stereopsis.

Optionally, the wavefront modulator uses a deformable membrane mirror,liquid crystal phase corrector, a variable focus liquid lens or avariable focus liquid mirror.

Optionally, the virtual imagery is a movie, a computer applicationinterface, computer application output, hand drawn strokes, text, imagesor graphics.

Optionally, the display device has pupil trackers to detect anapproximate point of fixation of the user's gaze such that a virtualcursor can be projected into the virtual imagery and navigated usinggaze direction.

Virtual Retinal Display with Occlusion Support BACKGROUND

A virtual retinal display (VRD) projects a beam of light onto the eye,and scans the beam rapidly across the eye in a two-dimensional rasterpattern. It modulates the intensity of the beam during the scan, basedon a source video signal, to produce a spatially-varying image. Thecombination of human persistence of vision and a sufficiently fast andbright scan creates the perception of an object in the user's field ofview.

The VRD renders occlusions as part of any displayed virtual imagery,according to the user's current viewpoint relative to their physicalenvironment. It does not, however, intrinsically support occlusionparallax according to the position of the user's eye relative to the HMDunless it uses eye tracking for this purpose. In the absence of eyetracking, the HMD renders each VRD view according to a nominal eyeposition. If the actual eye position deviates from the assumed eyeposition, then the wavefront display nature of the VRD preventsmisregistration between the real world and the virtual imagery, but inthe presence of occlusions due to real or virtual objects, it may leadto object overlap or holes.

SUMMARY

Accordingly, this aspect provides an augmented reality device forinserting virtual imagery into a user's view, the device comprising:

an optical sensing device for optically sensing the user's physicalenvironment; and,

a display device with a virtual retinal display for projecting a beam oflight as a raster pattern of pixels, each pixel having a wavefront oflight with a curvature that provides the user with spatial cues as tothe perceived origin of the pixel such that the user perceives thevirtual imagery to be at a predetermined location in the physicalenvironment; wherein during use,

the virtual retinal display accounts for any occlusions that at leastpartially obscure the user's view of the perceived location of thevirtual imagery by using a spatial light modulator that blocks occludedparts of the wavefront and allows non-occluded parts of the wavefront topass.

To support occlusion parallax, the VRD can be augmented with a spatiallight (amplitude) modulator (SLM) such as a digital micromirror device(DMD). The SLM can be introduced immediately after the wavefrontmodulator and before the raster scanner. The video generator providesthe SLM with an occlusion map associated with each pixel in the rasterpattern. The SLM passes non-occluded parts of the wavefront but blocksoccluded parts. The amplitude-modulation capability of the SLM may bemulti-level, and each map entry in the occlusion map may becorrespondingly multi-level. However, in the limit case the SLM is abinary device, i.e. either passing light or blocking light, and theocclusion map is similarly binary.

Optionally, the VRD projects red, green and blue beams of light, theintensity of each beam being modulated to color each pixel of the rasterpattern.

Optionally, the VRD has a video generator for providing the spatiallight modulator with an occlusion map for each pixel of the rasterpattern.

Optionally, the display device has a controller connected to the opticalsensing device and an image generator for providing image data to thevideo generator in response to the controller, such that the virtualimagery is selected and positioned by the controller. Optionally, thecontroller has a data connection to an external source for receivingdata related to the virtual imagery.

Optionally, the display device has a see-through display such that theVRD projects the raster pattern via the see-through display.

In a particularly preferred form the display device has two of the VRDsand two of the see-through displays, one VRD and see-through display foreach eye.

Optionally, the occlusion is a physical occlusion or a virtual occlusiongenerated by the controller to at least partially obscure the virtualimagery.

Optionally, the display device and the optical sensing device areadapted to be worn on the user's head.

Optionally, the optical sensing device senses a surface in the physicalenvironment, the surface having a pattern of coded data disposed on it,such that the display device uses information from the coded data toselect and position the virtual imagery to be displayed.

Optionally, the optical sensing device is camera-based and during use,provides identity and position data related to the coded surface to thecontroller for determining the virtual imagery displayed.

Optionally, the VRD has a wavefront modulator to match the curvature ofthe wavefronts of light projected for each pixel in the raster pattern,with the curvature of the wavefronts of light that would be transmittedthrough the see-through display if the virtual imagery were actualimagery at a predetermined position relative to the coded surface, suchthat the user views the virtual imagery at the predetermined positionregardless of changes in position of the user's eyes with respect to thesee-through display.

Optionally, the spatial light modulator uses a digital micromirrordevice to create an occlusion shadow in the scanned raster pattern.

Optionally, the camera generates an occlusion map for the scanned rasterpatterns in the source video signal, and the spatial light modulatoruses the occlusion map to control the digital micromirror device.

Optionally, each of the VRDs presents a slightly different image to eachof the user's eyes, the slight differences being based on eyeseparation, and the distance to the predetermined position of thevirtual imagery to create a perception of depth via stereopsis.

Optionally, the wave front modulator has a deformable membrane mirror,liquid crystal phase corrector, a variable focus liquid lens or avariable focus liquid mirror.

Optionally, the virtual imagery is a movie, a computer applicationinterface, computer application output, hand drawn strokes, text, imagesor graphics.

Optionally, the display device has pupil trackers to detect anapproximate point of fixation of the user's gaze such that a virtualcursor can be projected into the virtual imagery and navigated usinggaze direction.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the invention will now be described by way ofexample only with reference to the accompanying drawings, in which:

FIG. 1 shows the structure of a complete tag;

FIG. 2 shows a symbol unit cell;

FIG. 3 shows nine symbol unit cells;

FIG. 4 shows the bit ordering in a symbol;

FIG. 5 shows a tag with all bits set;

FIG. 6 shows a tag group made up of four tag types;

FIG. 7 shows the continuous tiling of tag groups;

FIG. 8 shows the interleaving of codewords A, B, C & D with a tag;

FIG. 9 shows a codeword layout;

FIG. 10 shows a tag and its eight immediate neighbours labelled with itscorresponding bit index;

FIG. 11 shows a user wearing a HMD with single eye display;

FIG. 12 shows a user wearing a HMD with respective displays for eacheye;

FIG. 13 is a schematic representation of a camera capturing light raysfrom two point sources;

FIG. 14 is a schematic representation of a display of the image of thetwo points sources captured by the camera of FIG. 13;

FIG. 15 is a schematic representation of a wavefront display of avirtual point source of light;

FIG. 16 is a diagrammatic representation of a HMD with a single eyedisplay;

FIG. 17 a schematically shows a wavefront display using a DMM;

FIG. 17 b schematically shows the wavefront display of FIG. 17 a withthe DMM deformed to diverge the project beam;

FIG. 18 a schematically shows a wavefront display using a deformableliquid lens;

FIG. 18 b schematically shows the wavefront display of FIG. 18 a withthe liquid lens deformed to diverge the projected beam;

FIG. 19 diagrammatically shows the modification to the HMD of FIG. 16 inorder to support occlusions;

FIG. 20 schematically shows the wavefront display of FIG. 15 withocclusion support;

FIG. 21 schematically shows the wavefront display of FIG. 18 b modifiedfor occlusion support;

FIG. 22 is a diagrammatic representation of a HMD with a binoculardisplay;

FIG. 23 shows a HMD directly linked to the Netpage server;

FIG. 24 shows the HMD linked to a Netpage Pen and a Netpage server via acommunications network.

FIG. 25 shows a HMD linked to a Netpage relay which is in turn linked toa Netpage server via a communications network;

FIG. 26 schematically shows a HMD with image warper;

FIG. 27 shows a HMD linked to a cursor navigation and selection devices;

FIG. 28 shows a HMD with biometric sensors;

FIG. 29 shows a physical Netpage with pen-scale and HMD-scale tagpatterns;

FIG. 30 shows the SVD on a printed Netpage;

FIG. 31 shows printed calculator with a SVD for the display and Netpagepen;

FIG. 32 shows a printed form with a SVD for a text field displayingconfidential information;

FIG. 33 shows the page of FIG. 29 with handwritten annotations capturedas digital ink and shown as a SVD;

FIG. 34 shows a Netpage with static and dynamic page elementsincorporated into the SVD;

FIG. 35 shows a mobile phone with display screen printed with pen-scaleand HMD-scale tag patterns;

FIG. 36 shows a mobile phone with SVD that extends beyond the displayscreen;

FIG. 37 shows a mobile phone with display screen and keypad provided bythe SVD;

FIG. 38 shows a cinema screen with HMD-scale tag pattern for screeningmovies as SVD's;

FIG. 39 shows a video monitor with HMD-scale tag pattern for a SVD of avideo signal from a range of sources; and

FIG. 40 shows a computer screen with pen-scale and HMD-scale tagpatterns, and a tablet with a pen-scale tag pattern for an SVD of akeyboard.

DETAILED DESCRIPTION

As discussed above, the invention is well suited for incorporation inthe Assignee's Netpage system. In light of this, the invention has beendescribed as a component of a broader Netpage architecture. However, itwill be readily appreciated that augmented reality devices have muchbroader application in many different fields. Accordingly, the presentinvention is not restricted to a Netpage context.

Additional cross referenced documents are listed at the end of theDetailed Description. These documents are predominantly non-patentliterature and have been numbered for identification at the relevantpart of the description. The disclosures of these documents areincorporated by cross reference.

Netpage Surface Coding

Introduction

This section defines a surface coding used by the Netpage system(described in co-pending application Docket No. NPS110US as well as manyof the other cross referenced documents listed above) to imbue otherwisepassive surfaces with interactivity in conjunction with Netpage sensingdevices (described below).

When interacting with a Netpage coded surface, a Netpage sensing devicegenerates a digital ink stream which indicates both the identity of thesurface region relative to which the sensing device is moving, and theabsolute path of the sensing device within the region.

Surface Coding

The Netpage surface coding consists of a dense planar tiling of tags.Each tag encodes its own location in the plane. Each tag also encodes,in conjunction with adjacent tags, an identifier of the regioncontaining the tag. In the Netpage system, the region typicallycorresponds to the entire extent of the tagged surface, such as one sideof a sheet of paper.

Each tag is represented by a pattern which contains two kinds ofelements. The first kind of element is a target. Targets allow a tag tobe located in an image of a coded surface, and allow the perspectivedistortion of the tag to be inferred. The second kind of element is amacrodot. Each macrodot encodes the value of a bit by its presence orabsence.

The pattern is represented on the coded surface in such a way as toallow it to be acquired by an optical imaging system, and in particularby an optical system with a narrowband response in the near-infrared.The pattern is typically printed onto the surface using a narrowbandnear-infrared ink.

Tag Structure

FIG. 1 shows the structure of a complete tag 200. Each of the four blackcircles 202 is a target. The tag 200, and the overall pattern, hasfour-fold rotational symmetry at the physical level.

Each square region represents a symbol 204, and each symbol representsfour bits of information. Each symbol 204 shown in the tag structure hasa unique label 216. Each label 216 has an alphabetic prefix and anumeric suffix.

FIG. 2 shows the structure of a symbol 204. It contains four macrodots206, each of which represents the value of one bit by its presence (one)or absence (zero).

The macrodot 206 spacing is specified by the parameter s throughout thisspecification. It has a nominal value of 143 μm, based on 9 dots printedat a pitch of 1600 dots per inch. However, it is allowed to vary withindefined bounds according to the capabilities of the device used toproduce the pattern.

FIG. 3 shows an array 208 of nine adjacent symbols 204. The macrodot 206spacing is uniform both within and between symbols 208.

FIG. 4 shows the ordering of the bits within a symbol 204.

Bit zero 210 is the least significant within a symbol 204; bit three 212is the most significant. Note that this ordering is relative to theorientation of the symbol 204. The orientation of a particular symbol204 within the tag 200 is indicated by the orientation of the label 216of the symbol in the tag diagrams (see for example FIG. 1). In general,the orientation of all symbols 204 within a particular segment of thetag 200 is the same, consistent with the bottom of the symbol beingclosest to the centre of the tag.

Only the macrodots 206 are part of the representation of a symbol 204 inthe pattern. The square outline 214 of a symbol 204 is used in thisspecification to more clearly elucidate the structure of a tag 204. FIG.5, by way of illustration, shows the actual pattern of a tag 200 withevery bit 206 set. Note that, in practice, every bit 206 of a tag 200can never be set.

A macrodot 206 is nominally circular with a nominal diameter of (5/9)s.However, it is allowed to vary in size by ±10% according to thecapabilities of the device used to produce the pattern.

A target 202 is nominally circular with a nominal diameter of (17/9)s.However, it is allowed to vary in size by ±10% according to thecapabilities of the device used to produce the pattern.

The tag pattern is allowed to vary in scale by up to ±10% according tothe capabilities of the device used to produce the pattern. Anydeviation from the nominal scale is recorded in the tag data to allowaccurate generation of position samples.

Tag Groups

Tags 200 are arranged into tag groups 218. Each tag group contains fourtags arranged in a square. Each tag 200 has one of four possible tagtypes, each of which is labelled according to its location within thetag group 218. The tag type labels 220 are 00, 10, 01 and 11, as shownin FIG. 6.

FIG. 7 shows how tag groups are repeated in a continuous tiling of tags,or tag pattern 222. The tiling guarantees the any set of four adjacenttags 200 contains one tag of each type 220.

Codewords

The tag contains four complete codewords. The layout of the fourcodewords is shown in FIG. 8. Each codeword is of a punctured 2⁴-ary (8,5) Reed-Solomon code. The codewords are labelled A, B, C and D.Fragments of each codeword are distributed throughout the tag 200.

Two of the codewords are unique to the tag 200. These are referred to aslocal codewords 224 and are labelled A and B. The tag 200 thereforeencodes up to 40 bits of information unique to the tag.

The remaining two codewords are unique to a tag type, but common to alltags of the same type within a contiguous tiling of tags 222. These arereferred to as global codewords 226 and are labelled C and D,subscripted by tag type. A tag group 218 therefore encodes up to 160bits of information common to all tag groups within a contiguous tilingof tags.

Reed-Solomon Encoding

Codewords are encoded using a punctured 2⁴-ary (8, 5) Reed-Solomon code.A 2⁴-ary (8, 5) Reed-Solomon code encodes 20 data bits (i.e. five 4-bitsymbols) and 12 redundancy bits (i.e. three 4-bit symbols) in eachcodeword. Its error-detecting capacity is three symbols. Itserror-correcting capacity is one symbol.

FIG. 9 shows a codeword 228 of eight symbols 204, with five symbolsencoding data coordinates 230 and three symbols encoding redundancycoordinates 232. The codeword coordinates are indexed in coefficientorder, and the data bit ordering follows the codeword bit ordering.

A punctured 2⁴-ary (8,5) Reed-Solomon code is a 2⁴-ary (15,5)Reed-Solomon code with seven redundancy coordinates removed. The removedcoordinates are the most significant redundancy coordinates.

The code has the following primitive polynominal:

p(x)=x ⁴ +x+1   (EQ 1)

The code has the following generator polynominal:

g(x)=(x+α)(x+α ²) . . . (x+α ¹⁰)   (EQ 2)

For a detailed description of Reed-Solomon codes, refer to Wicker, S. B.and V. K. Bhargava, eds., Reed-Solomon Codes and Their Applications,IEEE Press, 1994, the contents of which are incorporated herein byreference.

The Tag Coordinate Space

The tag coordinate space has two orthogonal axes labelled x and yrespectively. When the positive x axis points to the right, then thepositive y axis points down.

The surface coding does not specify the location of the tag coordinatespace origin on a particular tagged surface, nor the orientation of thetag coordinate space with respect to the surface. This information isapplication-specific. For example, if the tagged surface is a sheet ofpaper, then the application which prints the tags onto the paper mayrecord the actual offset and orientation, and these can be used tonormalise any digital ink subsequently captured in conjunction with thesurface.

The position encoded in a tag is defined in units of tags. Byconvention, the position is taken to be the position of the centre ofthe target closest to the origin.

Tag Information Content

Table 1 defines the information fields embedded in the surface coding.Table 2 defines how these fields map to codewords.

TABLE 1 Field definitions field width description per codeword codewordtype 2 The type of the codeword, i.e. one of A (b′00′), B (b′01′), C(b′10′) and D (b′11′). per tag tag type 2 The type¹ of the tag, i.e. oneof 00 (b′00′), 01 (b′01′), 10 (b′10′) and 11 (b′11′). x coordinate 13The unsigned x coordinate of the tag². y coordinate 13 The unsigned ycoordinate of the tag^(b). active area flag 1 A flag indicating whetherthe tag is a member of an active area. b′1′ indicates membership. activearea 1 A flag indicating whether an active map flag area map is present.b′1′ indicates the presence of a map (see next field). If the map isabsent then the value of each map entry is derived from the active areaflag (see previous field). active area 8 A map³ of which of the tag'smap immediate eight neighbours are members of an active area. b′1′indicates membership. data fragment 8 A fragment of an embedded datastream. Only present if the active area map is absent. per tag groupencoding 8 The format of the encoding. format 0: the present encodingOther values are TBA. region flags 8 Flags controlling theinterpretation and routing of region-related information. 0: region IDis an EPC 1: region is linked 2: region is interactive 3: region issigned 4: region includes data 5: region relates to mobile applicationOther bits are reserved and must be zero. tag size 16 The differencebetween the actual tag adjustment size and the nominal tag size⁴, in 10nm units, in sign-magnitude format. region ID 96 The ID of the regioncontaining the tags. CRC 16 A CRC⁵ of tag group data. total 320¹corresponds to the bottom two bits of the x and y coordinates of thetag ²allows a maximum coordinate value of approximately 14 m ³FIG. 29indicates the bit ordering of the map

FIG. 10 shows a tag 200 and its eight immediate neighbours, eachlabelled with its corresponding bit index in the active area map. Anactive area map indicates whether the corresponding tags are members ofan active area. An active area is an area within which any capturedinput should be immediately forwarded to the corresponding Netpageserver for interpretation. It also allows the Netpage sensing device tosignal to the user that the input will have an immediate effect.

TABLE 2 Mapping of fields to codewords codeword field codeword bitsfield width bits A  1:0 codeword type 2 all (b′00′) 10:2  x coordinate 912:4  19:11 y coordinate 9 12:4  B  1:0 codeword type 2 all (b′01′)  2tag type 1 0 5:2 x coordinate 4 3:0  6 tag type 1 1 9:6 y coordinate 43:0 10 active area flag 1 all 11 active area map 1 all flag 19:12 activearea map 8 all 19:12 data fragment 8 all C₀₀ 1:0 codeword type 2 all(b′10′) 9:2 encoding format 8 all 17:10 region flags 8 all 19:18 tagsize 2 1:0 adjustment C₀₁ 1:0 codeword type 2 all (b′10′) 15:2  tag size14 15:2  adjustment 19:16 region ID 4 3:0 C₁₀ 1:0 codeword type 2 all(b′10′) 19:2  region ID 18 21:4  C₁₁ 1:0 codeword type 2 all (b′10′)19:2  region ID 18 39:22 D₀₀ 1:0 codeword type 2 all (b′11′) 19:2 region ID 18 57:40 D₀₁ 1:0 codeword type 2 all (b′11′) 19:2  region ID18 75:58 D₁₀ 1:0 codeword type 2 all (b′11′) 19:2  region ID 18 93:76D₁₁ 1:0 codeword type 2 all (b′11′) 3:2 region ID 2 95:94 19:4  CRC 16all ⁴the nominal tag size is 1.7145 mm (based on 1600 dpi, 9 dots permacrodot, and 12 macrodots per tag) ⁵CCITT CRC-16 [7]

Note that the tag type can be moved into a global codeword to maximiselocal codeword utilization. This in turn can allow larger coordinatesand/or 16-bit data fragments (potentially configurably in conjunctionwith coordinate precision). However, this reduces the independence ofposition decoding from region ID decoding and has not been included inthe specification at this time.

Embedded Data

If the “region includes data” flag in the region flags is set then thesurface coding contains embedded data. The data is encoded in multiplecontiguous tags' data fragments, and is replicated in the surface codingas many times as it will fit.

The embedded data is encoded in such a way that a random and partialscan of the surface coding containing the embedded data can besufficient to retrieve the entire data. The scanning system reassemblesthe data from retrieved fragments, and reports to the user whensufficient fragments have been retrieved without error.

As shown in Table 3, a 200-bit data block encodes 160 bits of data. Theblock data is encoded in the data fragments of A contiguous group of 25tags arranged in a 5×5 square. A tag belongs to a block whose integercoordinate is the tag's coordinate divided by 5. Within each block thedata is arranged into tags with increasing x coordinate withinincreasing y coordinate.

A data fragment may be missing from a block where an active area map ispresent. However, the missing data fragment is likely to be recoverablefrom another copy of the block.

Data of arbitrary size is encoded into a superblock consisting of acontiguous set of blocks arranged in a rectangle. The size of thesuperblock is encoded in each block. A block belongs to a superblockwhose integer coordinate is the block's coordinate divided by thesuperblock size. Within each superblock the data is arranged into blockswith increasing x coordinate within increasing y coordinate.

The superblock is replicated in the surface coding as many times as itwill fit, including partially along the edges of the surface coding.

The data encoded in the superblock may include more precise typeinformation, more precise size information, and more extensive errordetection and/or correction data.

TABLE 3 Embedded data block field width description data type 8 The typeof the data in the superblock. Values include: 0: type is controlled byregion flags 1: MIME Other values are TBA. superblock 8 The width of thesuperblock, in blocks. width superblock 8 The height of the superblock,in height blocks. data 160 The block data. CRC 16 A CRC⁶ of the blockdata. total 200 ⁶CCITT CRC-16 [7]

Cryptographic Signature of Region ID

If the “region is signed” flag in the region flags is set then thesurface coding contains a 160-bit cryptographic signature of the regionID. The signature is encoded in a one-block superblock.

In an online environment any signature fragment can be used, inconjunction with the region ID, to validate the signature. In an offlineenvironment the entire signature can be recovered by reading multipletags, and can then be validated using the corresponding public signaturekey. This is discussed in more detail in Netpage Surface Coding Securitysection of the cross reference co-pending application Docket No.NPS100US, the content of which is incorporated within the presentspecification.

MIME Data

If the embedded data type is “MIME” then the superblock containsMultipurpose Internet Mail Extensions (MIME) data according to RFC 2045(see Freed, N., and N. Borenstein, “Multipurpose Internet MailExtensions (MIME)—Part One: Format of Internet Message Bodies”, RFC2045, November 1996), RFC 2046 (see Freed, N., and N. Borenstein,“Multipurpose Internet Mail Extensions (MIME)—Part Two: Media Types”,RFC 2046, November 1996) and related RFCs. The MIME data consists of aheader followed by a body. The header is encoded as a variable-lengthtext string preceded by an 8-bit string length. The body is encoded as avariable-length type-specific octet stream preceded by a 16-bit size inbig-endian format.

The basic top-level media types described in RFC 2046 include text,image, audio, video and application.

RFC 2425 (see Howes, T., M. Smith and F. Dawson, “A MIME Content-Typefor Directory Information”, RFC 2045, September 1998) and RFC 2426 (seeDawson, F., and T. Howes, “vCard MIME Directory Profile”, RFC 2046,September 1998) describe a text subtype for directory informationsuitable, for example, for encoding contact information which mightappear on a business card.

Encoding and Printing Considerations

The Print Engine Controller (PEC) supports the encoding of two fixed(per-page) 2⁴-ary (15,5) Reed-Solomon codewords and six variable(per-tag) 2⁴-ary (15,5) Reed-Solomon codewords. Furthermore, PECsupports the rendering of tags via a rectangular unit cell whose layoutis constant (per page) but whose variable codeword data may vary fromone unit cell to the next. PEC does not allow unit cells to overlap inthe direction of page movement.

A unit cell compatible with PEC contains a single tag group consistingof four tags. The tag group contains a single A codeword unique to thetag group but replicated four times within the tag group, and fourunique B codewords. These can be encoded using five of PEC's sixsupported variable codewords. The tag group also contains eight fixed Cand D codewords. One of these can be encoded using the remaining one ofPEC's variable codewords, two more can be encoded using PEC's two fixedcodewords, and the remaining five can be encoded and pre-rendered intothe Tag Format Structure (TFS) supplied to PEC.

PEC imposes a limit of 32 unique bit addresses per TFS row. The contentsof the unit cell respect this limit. PEC also imposes a limit of 384 onthe width of the TFS. The contents of the unit cell respect this limit.

Note that for a reasonable page size, the number of variable coordinatebits in the A codeword is modest, making encoding via a lookup tabletractable. Encoding of the B codeword via a lookup table may also bepossible. Note that since a Reed-Solomon code is systematic, only theredundancy data needs to appear in the lookup table.

Imaging and Decoding Considerations

The minimum imaging field of view required to guarantee acquisition ofan entire tag has a diameter of 39.6s (i.e. (2×(12+2))√{square root over(2)}s), allowing for arbitrary alignment between the surface coding andthe field of view. Given a macrodot spacing of 143 μm, this gives arequired field of view of 5.7 mm.

Table 4 gives pitch ranges achievable for the present surface coding fordifferent sampling rates, assuming an image sensor size of 128 pixels.

TABLE 4 Pitch ranges achievable for present surface coding for differentsampling rates; dot pitch = 1600 dpi, macrodot pitch = 9 dots, viewingdistance = 30 mm, nib-to-FOV separation = 1 mm, image sensor size = 128pixels sampling rate pitch range 2 −40 to +49 2.5 −27 to +36 3 −10 to+18

Given the present surface coding, the corresponding decoding sequence isas follows:

-   -   locate targets of complete tag    -   infer perspective transform from targets    -   sample and decode any one of tag's four codewords    -   determine codeword type and hence tag orientation    -   sample and decode required local (A and B) codewords    -   codeword redundancy is only 12 bits, so only detect errors    -   on decode error flag bad position sample    -   determine tag x-y location, with reference to tag orientation    -   infer 3D tag transform from oriented targets    -   determine nib x-y location from tag x-y location and 3D        transform    -   determine active area status of nib location with reference to        active area map    -   generate local feedback based on nib active area status    -   determine tag type from A codeword    -   sample and decode required global (C and D) codewords (modulo        window alignment, with reference to tag type)    -   although codeword redundancy is only 12 bits, correct errors;        subsequent CRC verification will detect erroneous error        correction    -   verify tag group data CRC    -   on decode error flag bad region ID sample    -   determine encoding type, and reject unknown encoding    -   determine region flags    -   determine region ID    -   encode region ID, nib x-y location, nib active area status in        digital ink    -   route digital ink based on region flags

Note that region ID decoding need not occur at the same rate as positiondecoding.

Note that decoding of a codeword can be avoided if the codeword is foundto be identical to an already-known good codeword.

Head Mounted Display

The Netpage system provides a paper- and pen-based interface tocomputer-based and typically network-based information and applications.The Netpage coding is discussed in detail above and the Netpage pen isdescribed in the above cross referenced documents and in particular, aco-filed US application, temporarily identified here by its docketNPS109US.

The Netpage Head Mounted Display is an augmented reality device that canuse surfaces coded with Netpage tag patterns to situate a virtual imagein a user's field of view. The virtual imagery need not be in preciseregistration with the tagged surface, but can be ‘anchored’ to the tagpattern so that it appears to be part of the user's physical environmentregardless of whether they change their direction of gaze.

Overview

A printed Netpage, when presented in a user's field of view (FOV), canbe augmented with dynamic imagery virtually projected onto the page viaa see-through head-mounted display (HMD) worn by the user. The imageryis selected according to the unique identity of the Netpage, and isvirtually projected to match the three-dimensional position andorientation of the page with respect to the user. The imagery thereforeappears locked to the surface of the page, even as the position andorientation of the page changes due to head or page movement. The HMDprovides the correct stereopsis, vergence and accommodation cues toallow fatigue-free perception of the imagery “on” the surface.“Stereopsis”, “vergence” and “accommodation” relate to depth cues thatthe brain uses for three dimensional spatial awareness of objects in theFOV. These terms are explained below in the description of the HumanVisual System.

Although the imagery is “attached” to the surface, it can still bethree-dimensional and extend “out of the surface. The page is coded withidentity- and position-indicating tags in the usual way, but at a largerscale to allow longer-range acquisition. The HMD uses a Netpage sensorto image the tags and thereby identify the page and determine itsposition and orientation. If the page also supports pen interaction,then it may be coded with two sets of tags at different scales andutilising different infrared inks; or it may be coded with amulti-resolution tags which can be imaged and decoded at multiplescales; or the HMD tag sensor can be adapted to image and decodepen-scale tags. In any case the whole page surface is ideally tagged sothat it remains identifiable even when partially obscured, such as byanother page or by the user's hand. The Netpage HMD is lightweight andportable. It uses a radio interface to query a Netpage system and obtainstatic and dynamic page data. It uses an on-board processor to determinepage position and orientation, and to project imagery in real time tominimise display latency.

The Netpage HMD, in conjunction with a suitable Netpage, thereforeprovides a situated virtual display (SVD) capability. The display issituated in that its location and content are page-driven. It is virtualin that it is only virtually projected on the page and is therefore onlyseen by the user. Note that the Netpage Viewer [8] and the NetpageExplorer [3] both provide Netpage SVD capabilities, but in moreconstrained forms.

An SVD can be used to display a video clip embedded in a printed newsarticle; it can be used to show an object virtually associated with apage, such as a “pasted” photo; it can be used to show “secret”information associated with a page; and it can be used to show the pageitself, for example in the absence of ambient light. More generally, anSVD can transform a page (or any surface) into a general-purpose displaydevice, and more generally still, into a general-purpose computer systeminterface. SVDs can augment or subsume all current “display”applications, whether they be static or dynamic, passive or interactive,personal or shared, including such applications as commercial printpublications, on-demand printed documents, product packaging, postersand billboards, television, cinema, personal computers, personal digitalassistants (PDAs), mobile phones, smartphones and other personaldevices. As well as augmenting the planar surfaces of essentiallytwo-dimensional objects such as paper pages, SVDs can equally augmentthe multi-faceted or non-planar surfaces of three-dimensional objects.

Augmented reality in general typically relies on either a see-throughHMD or a video-based HMD [15]. A video-based HMD captures video of theuser's field of view, augments it with virtual imagery, and redisplaysit for the user's eyes to see. A see-through HMD, as discussed above,optically combines virtual imagery with the user's actual field of view.A video-based HMD has the advantage that registration between the realworld and the virtual imagery is relatively easy to achieve, sinceparallax due to eye position relative to the HMD doesn't occur. It hasthe disadvantage that it is typically bulky and has a narrow field ofview, and typically provides poor depth cues.

As shown in FIGS. 11 and 12, a see-through HMD has the advantage that itcan be relatively less bulky with a wider field of view, and can providegood depth cues. It has the disadvantage that registration between thereal world and the virtual imagery is difficult to achieve withoutintrusive calibration procedures and sophisticated eye tracking A HMDoften relies on inertial tracking to maintain registration during headmovement, since fiducial tracking is usually insufficiently fast, butthis is a somewhat inaccurate approach.

In a basic form, the HMD 300 may have a single display 302 for one eyeonly. However, as shown in FIG. 12 by using a wavefront display 304, 306for each eye respectively, the Netpage HMD 300 achieves perfectregistration in a see-through display without calibration or tracking

The use of fiducials in the real world to provide a basis forregistration is well-established in augmented reality applications [15,44]. However, fiducials are typically sparsely placed, making fiducialdetection complex, and the fiducial encoding capacity is typicallysmall, leading to a small fiducial identity space and fiducial ambiguityin large installations.

The surface coding used by the Netpage system is dense, overcomingsparseness issues encountered with fiducials. The Netpage systemguarantees global identifier uniqueness, overcoming ambiguity issuesencountered with fiducials. More broadly, the Netpage system providesthe first systematic and practical mechanism for coding a significantproportion of the surfaces with which people interact on a day-to-daybasis, providing an unprecedented opportunity to deploy augmentedreality technology in a consumer setting. The scope of Netpageapplications, and the universality of the devices used to interact withNetpage coded surfaces, makes the acquisition and assimilation ofNetpage devices extremely attractive to consumers.

The tag image processing and decoding system developed for Netpageoperates in real time at high-quality display frame rates (e.g. 100 Hzor higher). It therefore obviates the need for inaccurate inertialtracking

The Human Visual System

The human eye consists of a converging lens system, made up of thecornea and crystalline lens, and a light-sensitive array ofphotoreceptors, the retina, onto which the lens system projects a realimage of the eye's field of view. The cornea provides a fixed amount offocus which constitutes over two thirds of the eye's focusing power,while the crystalline lens provides variable focus under the control ofthe ciliary muscles which surround it. When the muscles are relaxed thelens is almost flat and the eye is focused at infinity. As the musclescontract the lens bulges, allowing the eye to focus more closely. Thepoint of closest achievable focus, the near point, recedes with age. Itmay be less than 10 cm in a teenager, but usually exceeds 25 cm bymiddle age.

A diaphragm known as the iris controls the amount of light entering theeye and defines its entrance pupil. It can expand to as much as 8 mm indarkness and contract to as little as 2 mm in bright light.

The limits of the visual field of the eye are about 60 degrees upwards,75 degrees downwards, 60 degrees inwards (in the nasal direction), andabout 90 degrees outwards (in the temporal direction). The visual fieldsof the two eyes overlap by about 120 degrees centrally. This defines theregion of binocular vision.

The retina consists of an uneven distribution of about 130 millionphotoreceptor cells. Most of these, the so-called rods, exhibit broadspectral sensitivity in the visible spectrum. A much smaller number(about 7 million), the so-called cones, variously exhibit three kinds ofrelatively narrower spectral sensitivity, corresponding to short, mediumand long wavelength parts of the visible spectrum. The rods confermonochrome sensitivity in low lighting conditions, while the conesconfer color sensitivity in relatively brighter lighting conditions. Thehuman visual system effectively interpolates short, medium andlong-wavelength cone stimuli in order to perceive spectral color.

The highest density of cones occurs in a small central region of theretina known as the macula. The macula contains the fovea, which in turncontains a tiny rod-free central region known as the foveola. The retinasubtends about 3.3 degrees of visual angle per mm. The macula, at about5 mm, subtends about 17 degrees; the fovea, at about 1.5 mm, about 5degrees; and the foveola, at about 0.4 mm, about 1.3 degrees. Thedensity of photoreceptors in the retina falls off gradually witheccentricity, in line with increasing photoreceptor size. A line throughthe center of the foveola and the center of the pupil defines the eye'svisual axis. The visual axis is tilted inwards (in the nasal direction)by about 5 degrees with respect to the eye's optical axis.

The photoreceptors in the retina connect to about a million retinalganglion cells which convey visual information to the brain via theoptic nerve. The density of ganglion cells falls off linearly witheccentricity, and much more rapidly than the density of photoreceptors.This linear fall-off confers scale-invariant imaging. In the foveola,each ganglion cell connects to an individual cone. Elsewhere in theretina a single ganglion cell may connect to many tens of rods andcones. Foveal visual acuity peaks at around 4 cycles per degree, is acouple of orders of magnitude less at 30 cycles per degree, and isimmeasurable beyond about 60 cycles per degree [33]. This upper limit isconsistent with the maximum cone density in the foveola of around twicethis number, and the corresponding ganglion cell density. Visual acuitydrops rapidly with eccentricity. For a 5-degree visual field, it dropsto 50% of peak acuity at the edges. For a 30-degree visual field, itdrops to 5%.

The human visual system provides two distinct modes of visualperception, operating in parallel. The first supports global analysis ofthe visual field, allowing a object of interest to be detected, forexample due to movement. The second supports detailed analysis of theobject of interest.

In order to perceive and analyse an object of interest in detail, thehead and/or the eyes are rapidly moved to align the eyes' visual axeswith the object of interest. This is referred to as fixation, and allowshigh-resolution foveal imaging of the object if interest. Fixationalmovements, or saccades, and fixational pauses, during which fovealimaging takes place, are interleaved to allow the brain to perceive andanalyse an extended object in detail. An initial gross saccade ofarbitrary magnitude provides initial fixation. This is followed by aseries of finer saccades, each of at most a few degrees, which scan theobject onto the foveola. Microsaccades, a fraction of a degree inextent, are implicated in the perception of very fine detail, such asindividual text characters. An ocular tremor, known as nystagmus,ensures continuous relative movement between the retina and a fixedscene. Without this tremor, retinal adaptation would cause the perceivedimage to fade out.

Although peripheral attention usually leads to foveal attention viafixation, the brain is also capable of attending to a peripheral pointof interest without fixating on it.

Light emitted by a point source creates a series of spherical wavefrontscentered on the point source. When the wavefronts impinge on the humaneye, the human visual system is able to change the shape of thecrystalline lens to bring the wavefronts to a point of focus on theretina. This is referred to as accommodation. The curvature of eachwavefront as it impinges on the eye is the inverse of the distance fromthe point source to the eye. The smaller the distance, the greater thewavefront curvature, and the greater the accommodation required. Thegreater the distance, the flatter the wavefronts, and the smaller theaccommodation required.

In order to fixate on a point source, the human visual system rotateseach eye so that the point source is aligned with the visual axis ofeach eye. This is referred to as vergence. Vergence in turn helpscontrol the accommodation response, and a mismatch between vergence andaccommodation cues can therefore cause eye strain.

The state of accommodation and vergence of the eyes in turn provides thevisual system with a cue to the distance from the eyes to the pointsource, i.e. with a sense of depth.

The disparity between the relative positions of multiple point sourcesin the two eyes' fields of view provides the visual system with a cue totheir relative depth. This disparity is referred to as binocularparallax. The visual system's process of fusing the inputs from the twoeyes and thereby perceiving depth is referred to as stereopsis.Stereopsis in turn helps achieve vergence and accommodation.

Binocular parallax and motion parallax, i.e. parallax induced byrelative motion, are the two most powerful depth cues used by the humanvisual system. Note that parallax may also lead to an occlusiondisparity.

The visual system's ability to locate a point source in space istherefore determined by the center and radius of curvature of thewavefronts emitted by the point source as they impinge on the eyes.Furthermore, the discussion of point sources applies equally to extendedobjects in general, by considering the surface of each extended objectas consisting of an infinite number of point sources. In practice, dueto the finite resolving power of the visual system, a finite number ofpoint sources is suffice to model an extended object.

Persistence of vision describes the inability of the human visualsystem, and the retina in particular, to detect changes in intensityoccurring above a certain critical frequency. This critical fusionfrequency (CFF) is between 50 and 60 Hz, and is somewhat dependent oncontrast and luminance conditions. It provides the basis for the humanvisual system's flicker-free perception of projected film and video.

Three-Dimensional Displays

If one imagines a spherical camera capable of capturingthree-dimensional images of its surrounding space, and a correspondingspherical display capable of displaying them, then a definingcharacteristic of the display is that it becomes invisible when placedin the same location as the camera, no matter how it is viewed. Thedisplay emits the same light as would have been emitted by the space itoccupies had it not been present. More conventionally, one can imagine acamera surface capable of recording all light penetrating it from oneside, and a corresponding display surface capable of emittingcorresponding light. This is illustrated in FIG. 13, where the camera308 is shown capturing a subset of rays 310 emitted by a pair of pointsources 312. FIG. 14 shows the display 314 is shown emittingcorresponding rays 316. In reality, a larger number of rays are capturedand displayed than shown in FIG. 14, so a viewer will perceive the pointsources 312 as being correctly located at fixed points inthree-dimensional space, independently of viewing position.

The capture and manipulation of true three-dimensional image data hasbeen the subject of much research in recent years, mainly for thepurpose of constructing novel views. The images captured by an infinitecollection of infinitely small spherical cameras define the so-calledplenoptic function [42], while the light penetrating an arbitrarysurface in three dimensions defines a so-called light field [36,30].Both functions, although theoretically continuous, are typicallydiscretized for practical manipulation, and are resampled to constructnovel views. Although the discussion so far has posited a 3D camera, thecamera can be virtual and a light field can be generated from a virtual3D model.

A light field has the advantage that it captures both position andocclusion parallax. It has the disadvantage that it is data-intensivecompared with a traditional 2D image. Conceptually, compared with aview-dependent 2D image, a discretized view-independent light field isdefined by an array of 2D images, each image corresponding to a pixel inthe view-dependent image. Although a light field can be used to generatea 2D image for a novel view, it is expensive to directly display a 2Dlight field. Because of this, 3D light field displays such as thelenslet display described in [35] only support relatively low spatialresolution. Furthermore, although the light field samples can be seen assamples of a suitably low-pass filtered set of wavefronts, the discretelight field display does not reconstruct the continuous wavefronts whichthe samples represent, relying instead on approximate integration by thehuman visual system.

Synthetic holographic displays have similar resolution problems [52].

FIG. 15 shows a simple wavefront display 322 of a virtual point sourceof light 318. In contrast to a discrete light field display, a wavefrontdisplay emits a set of continuous spherical wavefronts 324. The centreof curvature of each wavefront in the set to the virtual point source oflight 318. If the virtual point 318 was an actual point, it would beemitting spherical wavefronts 320. The wavefronts 324 emitted from thedisplay 322 are equivalent to the virtual wavefronts 320 had they passedthrough the display 322.

The advantage of the wavefront display 322 is that the description ofthe input 3D image is much smaller than the description of thecorresponding light field, since it consists of a 2D image augmentedwith depth information. The disadvantage of this representation is thatit fails to represent occlusion parallax. However, in applications whereocclusion parallax is not important, the wavefront display has clearadvantages.

A volumetric display acts as a simple wavefront display [24], but hasthe disadvantage that the volume of the display must encompass thevolume of the virtual object being displayed.

A virtual retinal display [27], as discussed in the next section, canact as a simple wavefront display when augmented with a wavefrontmodulator [43]. Unlike a volumetric display, it can simulate arbitrarydepth. It can be further augmented with a spatial light modulator [32]to support occlusions.

Many simpler display technologies have been developed which provide someof the cues used by the human visual system to perceive depth. Thesedisplay technologies are predominantly stereoscopic, i.e. they present adifferent view to each eye and rely on binocular disparity to stimulatedepth perception. In a stereoscopic head-mounted display, left and rightviews are presented directly to each eye. Left and right views may alsobe spectrally multiplexed on a conventional display and viewed throughglasses with a different filter for each eye, or time-multiplexed on aconventional display and viewed through glasses which shutter each eyein alternating fashion. Polarization is also commonly used for viewseparation. In an autostereoscopic display, so called because it allowsstereoscopic viewing without encumbering the viewer with headgear oreyewear, strips of the left and right view images are typicallyinterleaved and displayed together. When viewed through a parallaxbarrier or a lenticular array, the left eye sees only the stripscomprising the left image, and the right eye sees only the stripscomprising the right image. These displays often only provide horizontalparallax, only support limited variation in the position and orientationof the viewer, and only provide two viewing zones, i.e. one for eacheye. As discussed above, arrays of lenslets can be used to directlydisplay light fields and thus provide omnidirectional parallax [35],dynamic parallax barrier methods can be used to support wider movementof a single tracked viewer [50], and multi-projector lenticular displayscan be used to provide a larger number of viewing zones to multiplesimultaneous viewers [40]. In a head-mounted display, motion parallaxresults from rendering views according to the tracked position andorientation of the viewer, whereas in a multiview autostereoscopicsystem, motion parallax is intrinsic although typically of lowerquality.

The Netpage Head-Mounted Display

The Netpage HMD utilises a virtual retinal display⁷ (VRD) for each eye.A VRD projects a beam of light directly onto the eye, and scans the beamrapidly across the eye in a two-dimensional raster pattern. It modulatesthe intensity of the beam during the scan, based on a source videosignal, to produce a spatially-varying image. The combination of humanpersistence of vision and a sufficiently fast and bright scan createsthe perception of an object in the user's field of view. ⁷Also referredto as a Retinal Scanning Display (RSD).

The VRD utilises independent red, green and blue beams to create acolour display. The tri-stimulus nature of the human visual systemallows a red-green-blue display system to stimulate the perception ofmost perceptible colours. Although a colour display capability ispreferred, a monochromatic display capability also has utility.

Rendering the image presented to each eye differently according to eyeseparation and virtual object depth creates the perception of depth viastereopsis. Adjusting the projection angle into each eye to allowcorrect vergence further enhances depth perception, as does adjustingthe divergence of each beam to allow correct accommodation. Apart fromreinforcing depth perception, consistent depth cues maximise viewercomfort.

Key to the operation of the Netpage HMD is the registration of the imageprojected by the VRD with the surface of the Netpage onto which theimage is being virtually projected. By operating as a limited wavefrontdisplay, a VRD allows this registration to be achieved without requiringregistration between the eye and the VRD. In this regard it differs fromscreen-based HMDs, which require careful calibration or monitoring ofeye position relative to the HMD to achieve and maintain registration.Thus the view-independent nature of a wavefront display is exploited toavoid registration between the eye and the HMD, rather than its moreconventional purpose of avoiding a HMD altogether in the context of anautostereoscopic display. As an alternative to exploiting a VRD for thispurpose, a view-independent light field display can also be used, usinga much faster laser scan.

A VRD provides only a limited wavefront display capability because ofpractical limits on the size of its exit pupil. Ideally its exit pupilis large enough to cover the eye's maximum entrance pupil, at anyallowed position relative to the display. The position of the eye'spupil relative to the display can vary due to eye movements, variationsin the placement of the HMD, and variations in individual human anatomy.In practice it is advantageous to track the approximate gaze directionof the eye relative to the display, so that limited system resources canbe dedicated to generating display output where it will be seen and/orat an appropriate resolution.

Tracking the pupil also allows the system to determine an approximatepoint of fixation, which it can use to identify a document of interest.In a Netpage context, projecting virtual imagery onto the surface regionto which the user is directing foveal attention is most important. It isless critical to project imagery into the periphery of the user's fieldof view. Gaze tracking can also be used to navigate a virtual cursor, orto indicate an object to be selected or otherwise activated, such as ahyperlink.

In a Netpage context, the surface onto which the virtual imagery isbeing projected can generally be assumed to be planar, and for mostapplications the projected virtual object can similarly be assumed to beplanar. This simplifies the wavefront display requirements of theNetpage HMD. In particular, the wavefront curvature is not required tovary abruptly within a scanline. Alternatively, if the curvaturemodulation mechanism is slow, then the wavefront curvature can be fixedfor an entire frame, e.g. based on the average depth of the virtualobject. If the wavefront curvature cannot be varied automatically atall, then the system may still provide the user with a manual adjustmentmechanism for setting the curvature, e.g. based on the user's normalviewing distance. Alternatively, the wavefront curvature may be fixed bythe system based on a standard viewing distance, e.g. 50 cm, to maximiseviewer comfort.

FIG. 16 shows a block diagram of a VRD suitable for use in the NetpageHMD, similar in structure to VRDs described in [27, 28, 37 and 38].

The VRD as a whole scans a light beam across the eye 326 in atwo-dimensional raster pattern. The eye 326 focuses the beam 390 ontothe retina to produce a spot which traces out the raster pattern overtime. At any given time, the intensity of the beam and hence the spotrepresents the value of a single colour pixel in a two-dimensional inputimage. Human persistence of vision fuses the moving spot into theperception of a two-dimensional image. The required pixel rate of theVRD is the product of the image resolution and the frame rate. The framerate in turn is at least as high as the critical fusion frequency, andideally higher (e.g. 100 Hz or more). By way of example, a frame rate of100 Hz and a spatial resolution 2000 pixels by 2000 pixels gives a pixelrate of 400 MHz and a line rate of 200 kHz.

A video generator 328 accepts a stream of image data 330 and generatesthe requisite data and control signals 332 for displaying the image data330.

Light beam generators 334 generate red, green and blue beams 336, 338and 340 respectively. Each beam generator 334 has a matching intensitymodulator 342, for modulating the intensity of each beam according tothe corresponding component of the pixel colour 344 supplied by thevideo generator 328.

The beam generator 334 may be a gas or solid-state laser, alight-emitting diode (LED), or a super-luminescent LED. The intensitymodulator 342 may be intrinsic to the beam generator or may be aseparate device. For example, a gas laser may rely on a downstreamacousto-optic modulator (AOM) for intensity modulation, while asolid-state laser or LED may intrinsically allow intensity modulationvia its drive current.

Although FIG. 16 shows multiple beam generators 334 and colour intensitymodulators 342, a single monochrome beam generator may be utilised ifcolor projection is not required.

Furthermore, multiple beam generators and intensity modulators may beutilised in parallel to achieve a desired pixel rate. In general, anycomponent of the VRD whose fundamental operating rate limits theachievable pixel rate may be replicated, and the replicated componentsoperated in parallel, to achieve a desired pixel rate.

A beam combiner 346 combines the intensity modulated colored beams 348,350 and 352 into a single beam 354 multiple colored beams into a singlebeam suitable for scanning The beam combiner may utilise multiple beamsplitters.

A wavefront modulator 356 accepts the collimated input beam 354 andmodulates its wavefront to induce a curvature which is the inverse ofthe pixel depth signal 358 supplied by the video generator 328. Thepixel depth 358 is clipped at a reasonable depth, beyond which thewavefront modulator 356 passes a collimated beam. The wavefrontmodulator 356 may be a deformable membrane mirror (DMM) [43, 51], aliquid-crystal phase corrector [47], a variable focus liquid lens ormirror operating on an electrowetting principle [16, 25], or any othersuitable controllable wavefront modulator. Depending on the timeconstant of the modulator 356, it may be utilised to effect pixel-wise,line-wise or frame-wise wavefront modulation, corresponding topixel-wise, line-wise or frame-wise constant depth. Furthermore, asmentioned earlier, multiple wavefront modulators may be utilised inparallel to achieve higher-rate wavefront modulation. If the operationof the wavefront modulator is wavelength-dependent, then multiplewavefront modulators may be employed beam-wise before the beams arecombined. Even if the wavefront modulator is incapable of randompixel-wise modulation, it may still be capable of ramped modulationcorresponding to the linear change of depth within a single scanline ofthe projection of a planar object.

FIG. 17 a shows a simplified schematic of a DMM 360 used as a wavefrontmodulator (see FIG. 16). When the DMM 360 is flat, i.e. with no appliedvoltage (shown on the left), it reflects a collimated beam 362. Thiscorresponds to infinite pixel depth. FIG. 17 b shows the DMM 360deformed with an applied voltage. The deformed DMM now reflects aconverging beam 364 which becomes a diverging beam 368 beyond the focalpoint 366. This corresponds to a particular finite pixel depth.

FIG. 18 a shows a simplified schematic of a variable focus liquid lens370 used as a wavefront modulator (and as part of the beam expander).The lens is at rest with no applied voltage and produces a convergingbeam 364 which is collimated by the second lens 372. FIG. 18 b shows thelens 370 deformed by an applied voltage so that it produces a moreconverging beam 364 which is only partially collimated by the secondlens 372 to still produce a diverging beam 368. A similar configurationcan be used with a variable focus liquid mirror instead of a liquidlens.

Referring again to FIG. 16, a horizontal scanner 374 scans the beam in ahorizontal direction, while a subsequent vertical scanner 376 scans thebeam in a vertical direction. Together they steer the beam in atwo-dimensional raster pattern. The horizontal scanner 374 operates atthe pixel rate of the VRD, while the vertical scanner operates at theline rate. To prevent possible beating between the frame rate and thefrequency of microsaccades, which are of the same order, it is usefulfor the pixel-rate scan to occur horizontally with respect to the eye,since many detail-oriented microsaccades, such as occur during reading,are horizontal.

The horizontal scanner may utilise a resonant scanning mirror, asdescribed in [37]. Alternatively, it may utilise an acousto-opticdeflector, as described in [27,28], or any other suitable pixel-ratescanner, replicated as necessary to achieve the desired pixel rate.

Although FIG. 16 shows distinct horizontal and vertical scanners, thetwo scanners may be combined in a single device such as a biaxial MEMSscanner, as described in [37].

Similarly, FIG. 16 shows the video generator 328 producing video timingsignals 378 and 380, it may be convenient to derive video timing fromthe operation of the horizontal scanner 374 if it utilises a resonantdesign, since a resonant scanner's frequency is determined mechanically.Furthermore, since a resonant scanner generates a sinusoidal scanvelocity, it is crucial to vary pixel durations accordingly to ensurethat their spatial extent is constant [54].

An optional eye tracker 382 determines the approximate gaze direction384 of the eye 326. It may image the eye to detect the position of thepupil as well as the position of the corneal reflection of an infraredlightsource, to determine the approximate gaze direction. Typicalcorneal reflection eye tracking systems are described in [20,34]. Eyetracking in general is discussed in [23].

Multiple off-axis light sources may be positioned within the HMD, asprefigured in [14]. These can be lit in succession, so that eachsuccessive image of the eye contains the reflection of a single lightsource. The reflection data resulting from multiple successive imagescan then be combined to determine gaze direction 384, eitheranalytically or using least squares adjustment, without requiring priorcalibration of eye position with respect to the HMD. An image of theinfrared corneal reflection of a Netpage coded surface in the user'sfield of view may also serve as the basis for un-calibrated detection ofgaze direction.

If the gaze direction 384 of both eyes is tracked, then the resultanttwo fixation points can be averaged to determine the likely truefixation point.

The tracked gaze direction 384 may be low-pass filtered to suppress finesaccades and microsaccades.

An optional beam offsetter 386 acts on the gaze direction 384 providedby the eye tracker 382 to align the beam with the pupil of the eye 326.The gaze direction 384 is simultaneously used by a high-level imagegenerator to generate virtual imagery offset correspondingly.

Projection optics 388 finally project the beam 390 onto the eye 326,magnifying the scan angle to provide the required field of view angle.The projection optics include a visor-shaped optical combiner whichsimultaneously reflects the generated imagery onto the eye while passinglight from the environment. The VRD thereby acts as a see-throughdisplay. The visor is ideally curved, so that it magnifies the projectedimagery to fill the field of view.

The HMD as a whole, discussed below, ensures that the projected imageryis registered with a physical Netpage coded surface in the user's fieldof view. The optical transmission of the combiner may be fixed, or itmay be variable in response to active control or ambient light levels.For example, it may incorporate a liquid-crystal layer switchablebetween transmissive and opaque states, either under user or softwarecontrol. Alternatively or additionally, it may incorporate aphotochromic material whose opacity is a function of ambient lightlevels.

The HMD correctly renders occlusions as part of any displayed virtualimagery, according to the user's current viewpoint relative to a taggedsurface. It does not, however, intrinsically support occlusion parallaxaccording to the position of the user's eye relative to the HMD unlessit uses eye tracking for this purpose. In the absence of eye tracking,the HMD renders each VRD view according to a nominal eye position. Ifthe actual eye position deviates from the assumed eye position, then thewavefront display nature of the VRD prevents misregistration between thereal world and the virtual imagery, but in the presence of occlusionsdue to real or virtual objects, it may lead to object overlap or holes.

Referring to FIG. 19, the VRD can be further augmented with a spatiallight (amplitude) modulator (SLM) such as a digital micromirror device(DMD) [32, 48] to support occlusion parallax. The SLM 392 is introducedimmediately after the wavefront modulator 356 and before the rasterscanner 374, 376. Alternatively, the SLM 392 is introduced immediatelybefore the wavefront modulator (but after its beam expander). The videogenerator 328 provides the SLM 392 with an occlusion map 394 associatedwith the current pixel. The SLM passes non-occluded parts of thewavefront but blocks occluded parts. The amplitude-modulation capabilityof the SLM may be multi-level, and each map entry in the occlusion mapmay be correspondingly multi-level.

However, in the limit case the SLM is a binary device, i.e. eitherpassing light or blocking light, and the occlusion map is similarlybinary.

To prevent holes appearing when a nominally invisible part of thevirtual scene becomes visible due to eye movement, the HMD can makemultiple passes to display multiple depth planes in the virtual scene.The HMD can either render and display each depth plane in its entirety,or can render and display only enough of each depth plane to support themaximum eye movement possible.

FIG. 20 shows the wavefront display of FIG. 14 augmented with supportfor displaying an occlusion 396.

FIG. 21 shows the DMM 360 of FIGS. 17 a and 17 b augmented with a DMDSLM 392 to produce a VRD with occlusion support. The “shadow” 398 of thevirtual occlusion is a gap formed in the cross-section of the beamreflected by the DMD 360 by the SLM 392.

Per-pixel occlusion maps are easily calculated during rendering of avirtual model. They may also be derived directly from a depth image.Where the occluding object is an object in the real world, such as theuser's hand (as discussed further below), it may be represented as anopaque black virtual object during rendering.

Table 5 gives examples of the viewing angle associated with common mediaat various viewing distances. In the table, specified values are shownshaded, while derived values are shown un-shaded. For print media,various common viewing distances are specified and corresponding viewingangles are derived. Required VRD image sizes are then derived basedrepresenting a maximum feature frequency of 30 cycles per degree. Fordisplay media, various common viewing angles are specified andcorresponding viewing angles (and maximum feature frequencies) arederived. For both media types the corresponding surface resolution isalso shown.

Based on their native resolution and human visual acuity, display mediasuch as HDTV video monitors are suited to a viewing angle of between 30and 40 degrees. This is consistent with viewing recommendations for suchdisplay media. Based on their native size and human accommodationlimits, print media such as US Letter pages are also suited to a viewingangle of 30 to 40 degrees.

A VRD image size of around 2000 pixels by 2000 pixels is thereforeadequate for virtualising these media. Significantly less is required ifknowledge of gaze direction is used to project non-foveated parts of theimage at lower resolution.

TABLE 5 Viewing parameters for different media

⁸In units of screen height ⁹Per unit of screen height ¹⁰THX recommends36 degrees in back row of theatre ¹¹SMPTE EG-18-1994 recommends 30degrees viewing angle

FIG. 22 shows a block diagram of a Netpage HMD 300 incorporating dualVRDs 304 and 306 for binocular stereoscopic display as shown in FIG. 14.Dual earphones 800 and 802 provide stereophonic sound. Although dualVRDs are preferred, a single VRD providing a monoscopic displaycapability also has utility (see FIG. 13). Similarly, a single earphonealso has utility.

Although VRDs or similar display devices are preferred for incorporationin the Netpage HMD because they allow the incorporation of wavefrontcurvature modulation, more conventional display devices such as liquidcrystal displays may also be utilised, but with the added complexity ofrequiring more careful head and eye position calibration or tracking.Conventional LCD-based HMDs are described in detail in [45].

To maximise the operating range of the VRDs with respect to eyemovement, and to maximise user comfort, the optical axes of the VRDs canbe approximately aligned with the resting positions of the two eyes byadjusting the lateral separation of the VRDs and adjusting the tilt ofthe visor. This can be achieved as part of a fitting process and/orperformed manually by the user at any time. Note again that thewavefront display capability of the VRDs means that these adjustmentsare not required to achieve registration of virtual imagery with thephysical world.

A Netpage sensor 804 acquires images 806 of a Netpage coded surface inthe user's field of view. It may have a fixed viewing direction and arelatively narrow field of view (of the order of the minimum field ofview required to acquire and decode a tag); a variable viewing directionand a relatively narrow field of view; or a fixed viewing direction anda relatively wide field of view (of the order of the VRD viewing angleor even greater). In the first case, the user is constrained tointeracting with a Netpage coded surface in the fixed and narrow fieldof view of the sensor, requiring the head to be turned to face theNetpage of interest. In the second case, the gaze-tracked fixation pointcan be used to steer the image sensor's field of view, for example via atip-tilt mirror, allowing the user to interact with a Netpage byfixating on it. In the third case, the gaze-tracked fixation point canbe used to select a sub-region of the sensor's field of view, againallowing the user to interact with a Netpage by fixating on it. In thesecond and third cases, and as described earlier, the user's effectiveviewing angle is widened by using the tracked gaze direction to offsetthe beam.

A controlling HMD processor 808 accepts image data 330 from the Netpagesensor 804. The processor locates and decodes the tags in the image datato generate a continuous stream of identification, position andorientation information for the Netpage being imaged. A suitable Netpageimage sensor with an on-board image processor, and the correspondingimage processing algorithm, tag decoding algorithm and pose (positionand orientation) estimation algorithm, are described in [9,59]. In theHMD 300, the image sensor resolution is higher than described in [9] tosupport a greater range of tag pattern scales. The sensor utilises asmall aperture to ensure good depth of field, and an objective lenssystem for focusing, approximately as described in [4].

The Netpage sensor 804 incorporates a longpass or bandpass infraredfilter matched to the absorption peak of the infrared ink used to encodethe HMD-oriented Netpage tag pattern. It also includes a source ofinfrared illumination matched to the ink. Alternatively it relies on theinfrared component of ambient illumination to adequately illuminate thetag pattern for imaging purposes. In addition, large and/or distant SVDs(such as cinema screens, billboards, and even video monitors) areusefully self-illuminating, either via front or back illumination, toavoid reliance on HMD illumination.

Alternatively or additionally to determining the actual viewing distanceof the tagged surface by analysing the scale and perspective distortionof the tagged pattern images 806, the Netpage sensor 804 may include anoptical range finder. Time-of-flight measurement of an encoded opticalpulse train is a well-established technique for optical range finding,and a suitable system is described in [17].

The depth determined via the optical range finder can be used by the HMDto estimate the expected scale of the imaged tag pattern, thus makingtag image processing more efficient, and it can be used to fix the zdepth parameter during pose estimation, making the pose estimationprocess more efficient and/or accurate. It can also be used to adjustthe focus of Netpage sensor's optics, to provide greater effective depthof field, and can be used to change the zoom of the Netpage sensor'soptics, to allow a smaller image sensor to be utilised across a range ofviewing distances, and to reduce the image processing burden.

Zoom and/or focus control may be effected by moving a lens element, aswell as by modulating the curvature of a deformable membrane mirror[43,51], a liquid-crystal phase corrector [47], or other suitabledevice. Zoom may also be effected digitally, e.g. simply to reduce theimage processing burden.

Range-finding, whether based on pose estimation or time-of-flightmeasurement, can be performed at multiple locations on a surface toprovide an estimate of surface curvature. The available range data canbe interpolated to provide range data across the entire surface, and thevirtual imagery can be projected onto the resultant curved surface. Thegeometry of a tagged curved surface may also be known a priori, allowingproper projection without additional range-finding.

Rather than utilising a two-dimensional image sensor, the Netpage sensor804 may instead utilise a scanning laser, as described in [5]. Since theimage produced by the scanning laser is not distorted by perspective,pose estimation cannot be used to yield the z depth of the taggedsurface. Optical (or other) range finding is therefore crucial in thiscase. Pose estimation may still be performed to determinethree-dimensional orientation and two-dimensional position. The opticalrange finder may be integrated with the laser scanner, utilising thesame laser source and photodetector, and operating in multiplexedfashion with respect to scanning

The frame rate of the Netpage sensor 804 is matched to the frame rate ofthe image generator 328 (e.g. at least 50 Hz, but ideally 100 Hz ormore), so that the displayed image is always synchronised with theposition and orientation of the tagged surface. Decoding of the pageidentifier embedded in the surface coding can occur at a lower rate,since it changes much less often than position.

Decoding of the page identifier can be triggered when a tag pattern isre-acquired, and when the decoded position changes significantly.Alternatively, if the least significant bits of the page identifier areencoded in the same codewords which encode position, then full pageidentifier decoding can be triggered by a change in the leastsignificant page identifier bits.

The imaging axis of the Netpage sensor emerges from the HMD 300 betweenand slightly above the eyes, and is roughly normal to the face.Alternatively, the Netpage sensor 804 is arranged to image the back ofthe visor, so that its imaging axis roughly coincides with one eye'sresting optical axis.

Although the HMD 300 incorporates a single Netpage sensor 804, it mayalternatively incorporate dual Netpage sensors and be configured toperform pose estimation across both image sensor's acquired images. Itmay also incorporate multiple tag sensors to allow tag acquisitionacross a wider field of view.

Various scenarios for connecting the HMD 300 to a Netpage server 812 areillustrated in FIG. 23, FIG. 24 and FIG. 25.

A radio transceiver 810 (see FIG. 22) provides a communicationsinterface to a server such as a video server or a Netpage server 812.The architecture of the overall Netpage system with which the NetpageHMD 300 communicates is described in [1, 3].

The radio interface 810 may utilise any of a number of protocols andstandards, including personal-area and local-area standards such asBluetooth, IEEE 802.11, 802.15, and so on; and wide-area mobilestandards such as GSM, TDMA, CDMA, GPRS, etc. It may also utilisedifferent standards for outgoing and incoming communication, for exampleutilising a broadcast standard for incoming data, such as a satellite,terrestrial analogue or terrestrial digital standard.

The HMD 300 may effect communication with a server 812 in a multi-hopfashion, for example using a personal-area or local-area connection tocommunicate with a relay device 816 which in turn communicates with aserver via communications network 814 for a longer-range connection. Itmay also utilise multiple layers of protocols, for example communicatingwith the server via TCP/IP overlaid on a point-to-point Bluetoothconnection to a relay as well as on the broader Internet.

Alternatively or additionally, the HMD may utilise a wired connection toa relay or server, utilising one or more of a serial, parallel, USB,Ethernet, Firewire, analog video, and digital video standard.

The relay device 816 may, for example, be a mobile phone, personaldigital assistant or a personal computer. The HMD may itself act as arelay for other Netpage devices, such as a Netpage pen [4], or vicaversa.

In the Netpage architecture, the identifier of a Netpage is used toidentify a corresponding server which is able to provide informationabout the page and handle interactions with the page. When the HMD firstencounters a new page identifier, it looks up a corresponding server,for example via the DNS. Having identified a server, it retrieves staticand/or dynamic data associated with the page from the server. Havingretrieved the page data, an image generator 328 renders the page datastereoscopically for the two eyes according to the position andorientation of the Netpage with respect to the HMD, and optionallyaccording to the gaze directions of the eyes. The generated stereoimages include per-pixel depth information which is used by the VRDs 304and 306 to modulate wavefront curvature (see FIG. 22).

Static page data may include static images, text, line art and the like.Dynamic page data may include video 822, audio 824, and the like.

A sound generator 820 renders the corresponding audio, if any,optionally spatialised according to the relative positions of the HMDand the coded surface, and/or the virtual position(s) of the soundsource(s) relative to the coded surface. Suitable audio spatialisationtechniques are described in [41].

The HMD may download dynamic data such as video and audio into a localmemory or disk device, or it may obtain such data in streaming fashionfrom the server, with some degree of local buffering to decouple thelocal playback rate from any variations in streaming rate due to networkbehaviour.

Whether the image data is static or dynamic, the image generator 328constantly re-renders the page data to take into account the currentposition and orientation of the Netpage with respect to the HMD 300 (andoptionally according to gaze direction).

The frame rate of the image generator 328 and the VRDs 304, 306 is atleast the critical fusion frequency and is ideally faster. The framerate of the image generator and the VRDs may be different from the framerate of a video stream being displayed by the HMD 808. Ideally the imagegenerator utilises motion estimation to generate intermediate frames notexplicitly present in the video stream. Applicable techniques aredescribed in [21, 39]. If the video stream utilises a motion-basedencoding scheme such as an MPEG variant, then the HMD uses the motioninformation inherent in the encoding to generate intermediate frames.

As an alternative to the image generator in the HMD performing full pageimage rendering, the server may perform page image rendering andtransmit a corresponding video sequence to the HMD. Because of thelatency between pose estimation, image rendering and subsequent displayin this scenario, it is advantageous to still transform the resultantvideo stream according to pose in the HMD at the display frame rate.

More generally, whether image generation occurs on the server or in theHMD, a dedicated image warper 826 can be utilised to perspective-projectthe video stream according to the current pose, and to generate imagedata at a rate and at a resolution appropriate to the display,independent of the rate and resolution of the image data generated bythe image generator 328. This is illustrated in FIG. 26.

Multi-pass perspective projection techniques are described in [58].Single-pass techniques and systems are described in [31, 2]. Generaltechniques based on three-dimensional texture mapping are described in[13]. Transforming an input image to produce a perspective-projectedoutput image involves low-pass filtering and sampling the input imageaccording to the projection of each output pixel into the space of theinput image, i.e. computing the weighted sum of input pixels whichcontribute to each output pixel. In most hardware implementations, suchas described in [22], this is efficiently achieved by trilinearlyinterpolating an image pyramid which represents the input image atmultiple resolutions. The image pyramid is often represented by a mipmapstructure [57], which contains all power-of-two image resolutions. Amipmap only directly supports isotropic low-pass filtering, which leadsto a compromise between aliasing and blurring in areas where theprojection is anisotropic. However, anisotropic filtering is commonlyimplemented using mipmap interpolation by computing the weighted sum ofseveral mipmap samples.

In general, image generation for or in the HMD can make effective use ofmulti-resolution image formats such as the wavelet-based JPEG2000 imageformat, as well as mixed-resolution formats such as Mixed Raster Content(MRC), which treats line art and text differently to contone image data,and which is also incorporated in JPEG2000.

If there is noticeable latency between initial acquisition of a surfaceby the HMD, and subsequent display of virtual imagery associated withthat surface, then the HMD can signal acquisition of the surface to theuser to provide immediate feedback. For example, the HMD can highlightor outline the surface. This also serves to distinguish Netpage taggedsurfaces from un-tagged surfaces in the user's field of view. The tagsthemselves can contain an indication of the extent of the surface, toallow the HMD to highlight or outline the surface without interactionwith a server. Alternatively, the HMD can retrieve and display extentinformation from the server in parallel with retrieving full imagery.

The HMD may be split into a head-mounted unit and a control unit (notshown) which may, for example, be worn on a belt or other harness. Ifthe beam generators are compact, then the head-mounted unit may housethe entire VRDs 304 and 306. Alternatively, the control unit may housethe beam generators and modulators, and the combined beams may betransmitted to the head-mounted unit via optic fibers.

As described earlier, the user may utilise gaze to move a cursor withinthe field of view and/or to virtually “select” an object. For example,the object may represent a virtual control button or a hyperlink. TheHMD can incorporate an activation button, or “clicker” 828, as shown inFIG. 27, to allow the user to activate the currently selected object.The clicker 828 can consist of a simple switch, and may be mounted inany of a number of convenient locations. For example, it mayincorporated in a belt-mounted control unit, or it may be mounted on theindex finger for activation by the thumb. Multiple activation buttonscan also be provided, analogously to the multiple buttons on a computermouse.

Gaze-directed cursor movement can be particularly effective because theprecision of the movement of the cursor relative to a surface can beincreased by simply bringing the surface closer to the eye.

In the absence of precise gaze tracking, the user may move their head tomove a cursor and/or select an object, based simply on the optical axisof the HMD itself.

The HMD can also provide cursor navigation buttons 830 and/or a joystick832 to allow the user to move a cursor without utilising gaze. In thiscase the cursor is ideally tied to the currently active tagged surface,so that the cursor appears attached to the surface when relativemovement between the HMD and the surface occurs. The cursor can beprogrammed to move at a surface-dependent rate or a view-dependent rateor a compromise between the two, to give the user maximum control of thecursor.

The HMD can also incorporate a brain-wave monitor 834 to allow the userto move the cursor, select an object and/or activate the object bythought alone [60].

The HMD can provide a number of dedicated control buttons 836, e.g. forchanging the cursor mode (e.g. between gaze-directed, manuallycontrolled, or none), as well as for other control functions.

It is sometimes useful to dissociate a SVD from the physical surface towhich it is attached. The HMD can therefore provide a control button 836which allows the user to “lift” an SVD from a surface and place it at afixed location and in a fixed orientation relative to the HMD field ofview. The user may also be able to move the lifted SVD, zoom in and zoomout etc., using virtual or dedicated control buttons. The user may alsobenefit from zooming the SVD in situ, i.e. without lifting it, forexample to improve readability without reducing the viewing distance.

Refiring back to FIG. 22, the HMD can include a microphone 838 forcapturing ambient audio or voice input 840 from the user, and a still orvideo camera for capturing still or moving images 844 of the user'sfield of view. All captured audio, image and video input can be bufferedindefinitely by the HMD as well as streamed to a Netpage or other server812 (FIGS. 23, 24 and 25) for permanent storage. Audio and videorecording can also operate continuously with a fixed-size circularbuffer, allowing the user to always replay recent events without havingto explicitly record them.

The still or video camera 842 can be in line with the HMD's viewingoptics, allowing the user to capture essentially what they see. Thecamera can also be stereoscopic. In a simpler configuration, a singlecamera is mounted centrally and has an imaging axis parallel to theviewing axes. In a more sophisticated configuration, using appropriatebeam-steering optics coupled with the gaze tracking mechanism, thecamera can follow the user's gaze. The camera ideally provides automaticfocus, but provides the user with zoom control. Multiple cameraspointing in different directions can also be deployed to providepanoramic or rear-facing capture. Direct imaging of the cornea can alsocapture a wide-angle view of the world from the user's point of view[49].

If the camera is placed in line with the viewing optics, then thecorresponding beam combiner can be an LCD shutter, which can be closedduring exposure to allow the optical path to be dedicated to the cameraduring exposure. If the camera is a video camera, then display andcapture can be suitably multiplexed, although with a concomitant loss ofambient light unless the exposure time is short.

If the HMD incorporates a video camera, then the Netpage sensor can beconfigured to use it. If the HMD incorporates a corneal imaging videocamera, then it can be utilized by the gaze-tracking system as well asthe Netpage sensor.

Audio and video control buttons, for settings as well as for recordingand playback, can be provided by the HMD virtually or physically.

Binocular disparity between the images captured by a stereo camera canbe used by the HMD to detect foreground objects, such as the user's handor coffee cup, occluding the Netpage surface of interest. It can usethis to suppress rendering and/or projection of the SVD where it isoccluded. The HMD can also detect occlusions by analysing the entirevisible tagging of the Netpage surface of interest.

An icon representing a captured image or video clip can be projected bythe HMD into the user's field of view, and the user can select andoperate on it via its icon. For example, the user can “paste” it onto atagged physical surface, such as a page in a Netpage notebook. The imageor clip then becomes permanently associated with that location on thesurface, as recorded by the Netpage server, and is always shown at thatlocation when viewed by an authorized user through the HMD. Arbitraryvirtual objects, such as electronic documents, programs, etc., can beattached to a Netpage surface in a similar way.

The source of an image or video clip can also be a separate cameradevice associated with the user, rather than a camera integrated withthe HMD.

The HMD's microphone 838 and earphones 800, 802 allow it to convenientlysupport telephony functions, whether over a local connection such asBluetooth or IEEE 802.11, or via a longer-range connection such as GSMor CDMA. Voice may be carried via dedicated voice channels, and/or overIP (VoIP). Telephony control functions, such as dialling, answer andhangup, may be provided by the HMD via virtual or physical buttons, maybe provided by a separate physical device associated with the HMD ormore loosely with the user, or may be provided by a virtual interfacetied to a physical surface [7].

The HMD's earphones allow it to support music playback, as described in[8]. Audio can be copied or streamed from a server, or played backdirectly from a storage device in the HMD itself.

The HMD ideally incorporates a unique identifier which is registered toa specific user. This controls what the wearer of the HMD is authorizedto see.

The HMD can incorporate a biometric sensor, as shown in FIG. 28, toallow the system to verify the identity of the wearer. For example, thebiometric sensor may be a fingerprint sensor 846 incorporated in abelt-mounted control unit, or it may be a iris scanner 848 incorporatedin either or both the displays 304, 306 (see FIG. 22), possiblyintegrated with the gaze tracker 382 (see FIG. 16).

The HMD can include optics to correct for deficiencies in a user'svision, such as myopia, hyperopia, astigmatism, and presbyopia, as wellas non-conventional refractive errors such as aberrations, irregularastigmatism, and ocular layer irregularities. The HMD can incorporatefixed prescription optics, e.g. integrated into the beam-combiningvisor, or adaptive optics to measure and correct deficiencies on acontinuous basis [18,56].

The HMD can incorporate an accelerometer so that the acceleration vectordue to gravity can be detected. This can be used to project athree-dimensional image properly if desired. For example, during remoteconferencing it may be desirable to always render talking heads theright way up, independently of the orientation of the surfaces to whichthey are attached. As a side-effect, such projections will lean ifcentripetal acceleration is detected, such as when turning a corner in acar.

The HMD incorporates a battery, recharged by removal and insertion intoa battery charger, or by direct connection between the charger and theHMD. The HMD may also conveniently derive recharging power on acontinuous basis from an item of clothing which incorporates a flexiblesolar cell [53]. The item may also be in the shape of a cap or hat wornon the head, and the HMD may be integrated with the cap or hat.

Surface Coding

The scale of the HMD-oriented Netpage tag pattern disposed on aparticular medium is matched to the minimum viewing distance expectedfor that medium. The tag pattern is designed to allow the Netpage sensorin the HMD to acquire and decode an entire tag at the minimum supportedviewing distance. The pixel resolution of the Netpage image sensor thendetermines the maximum supported viewing distance for that medium. Thegreater the supported maximum viewing distance, the smaller the tagpattern projected on the image sensor, and the greater the image sensorresolution required to guarantee adequate sampling of the tag pattern.Surface tilt also increases the feature frequency of the imaged tagpattern, so the maximum supported surface tilt must also be accommodatedin the selected image sensor resolution.

The basis for a suitable Netpage tag pattern is described in [6]. Thehexagonal tag pattern described in the reference requires a samplingfield of view with a diameter of 36 features. This requires an imagesensor with a resolution of at least 72×72 pixels, assuming minimaltwo-times sampling. By way of example, assuming arbitrarily that theNetpage sensor in the HMD has an angular field of view of 10 degrees,and assuming the minimum supported viewing distance for a hand-heldprinted page is 30 cm, an appropriate HMD-oriented Netpage tag patternhas a scale of about 1.5 mm per feature (i.e. 30 cm×tan(5)/(36/2)).Further assuming the maximum supported viewing distance is 120 cm (i.e.4×30 cm), the required image sensor resolution is 288×288 pixels (i.e.4×72). Greater image sensor resolution allows for a greater range ofviewing distances. By comparison, assuming the minimum supported viewingdistance for a large-screen “HDTV” Netpage is 2 m, an appropriateHMD-oriented Netpage tag pattern has a scale of about 1 cm per feature(i.e. 2 m×tan(5)/(36/2)), and the same image sensor supports a maximumviewing distance of 8 m (i.e. 4×2 m). By way of further comparison,assuming the minimum supported viewing distance for a billboard Netpagemounted on the side of a building is 30 m, an appropriate HMD-orientedNetpage tag pattern has a scale of about 15 cm per feature (i.e. 30m×tan(5)/(36/2)), and the same image sensor supports a maximum viewingdistance of 120 m (i.e. 4×30 m).

Although it is useful for particular media types to utilise a consistenttag pattern scale, it is also possible for individual users to select atag pattern scale suited to their particular viewing preferences. Thisis particularly convenient when the Netpages in question are printed ondemand.

It is useful to encode the scale of a tag pattern in the data encoded inthe pattern, so that a decoding device such as the Netpage HMD candetermine the scale and hence the absolute viewing distance withoutreference to associated information. However, if it is not convenient toencode a scale factor in the tag data, then the scale factor can berecorded by the corresponding Netpage server, either per page instanceor per page type. The HMD then obtains the scale factor from the serveronce it has identified the page. In general, the server records thescale factor as well as an affine transform which relates the coordinatesystem of the tag pattern to the coordinate system of the physical page.

As described earlier, if a Netpage surface also supports peninteraction, then it may be coded with two sets of tags utilisingdifferent infrared inks, one set of tags printed at a pen-orientedscale, and the other set of tags printed at a HMD-oriented scale, asdiscussed above. Alternatively the surface may be coded withmulti-resolution tags which can be imaged and decoded at multiplescales. In another option, the HMD tag sensor is capable of acquiringand decoding pen-scale tags, then a single set of tags is sufficient. Alaser scanning Netpage sensor is capable of acquiring pen-scale tags atnormal viewing distances such as 30 cm to 120 cm.

Since the virtual imagery displayed by the HMD is effectively added tothe user's view of the real world, the physical Netpage surface regiononto which the imagery is virtually projected is ideally printed black.It is impractical to selectively change the opacity of the HMD visor,since the beam associated with a single pixel may cover the entire exitpupil of the VRD, depending on its depth.

Tags are ideally disposed on a surface invisibly, e.g. by being printedusing an infrared ink. However, visible tags may be utilised whereinvisibility is impractical. Although printing is an effective mechanismfor disposing tags on a surface, tags may also be manufactured on orinto a surface, such as via embossing. Although inkjet printing is aneffective printing mechanism, other printing mechanisms may also beusefully employed, such as laser printing, dye sublimation, thermaltransfer, lithography, offset, gravure, etc.

Neither pen-oriented nor HMD-oriented Netpage tags are limited in theirapplication to surfaces traditionally associated with publications,displays and computer interfaces. For example, tags can also be appliedto skin in the form of temporary or permanent tattoos; they can beprinted on or woven into textiles and fabric; and in general they can beapplied to any physical surface where they have utility. HMD-orientedtags, because of their intrinsically larger scale, are more easilyapplied to a wide range of surfaces than pen-oriented tags.

Applications

FIG. 29 shows a mockup of a printed page 850 containing a typicalarrangement of text 858, graphics and images 842. The page 850 alsoincludes two invisible tag patterns 854 and 856. One tag pattern 854 isscaled for close-range imaging by a Netpage stylus or pen or otherdevice typically in contact with or in close proximity to the page 850.The other tag pattern 856 is scaled for longer-range imaging by aNetpage HMD. Either tag pattern may be optional on any given page.

FIG. 30 shows the page 850 of FIG. 29 augmented with a virtual embeddedvideo clip 860 when viewed through the Netpage HMD, i.e. the video clip860 is a dedicated situated virtual display (SVD) on the page. The videoclip appears with playback controls 862. A playback control buttons canbe activated using a Netpage stylus or pen 8 (see FIG. 31).Alternatively a control button can be selected and activated via theHMD's clicker as described earlier. The control buttons 862 can also beprinted on the page 850. Alternatively still, a generic Netpage remotecontrol may be utilised in conjunction with the Netpage HMD. The remotecontrol may provide generic media playback control buttons, such asplay, pause, stop, rewind, skip forwards, skip backwards, volumecontrol, etc. The Netpage system can interpret playback control commandsreceived from a Netpage remote control associated with a user aspertaining to the user's currently selected media object (e.g. videoclip 860).

The video clip 860 is just one example of the use of an SVD to augment adocument. In general, an arbitrary interactive application with agraphical user interface can make use of an SVD in the same manner.

FIG. 31 shows a four-function calculator application 864 embedded in apage 850, with the page augmented with a virtual display 866 for thecalculator. The input buttons 868 for the calculator are printed on thepage, but could also be displayed virtually.

FIG. 32 shows a page 850 augmented with a display 870 for confidentialinformation only intended for the user. As described earlier, apart fromregistration of the HMD as belonging to the user, the HMD may verifyuser identify via a biometric measurement. Alternatively, the user maybe required to provide a password before the HMD will display restrictedinformation.

FIG. 33 shows the page 850 of FIG. 29 augmented with virtual digital ink9 drawn using a non-marking Netpage stylus or pen 8. Virtual digital inkhas the advantage that it can be virtually styled, e.g. with strokewidth, colour, texture, opacity, calligraphic nib orientation, orartistic style such as airbrush, charcoal, pencil, pen, etc. It also hasthe advantage that it is only seen by authorized users via their HMDs(or via Netpage browsers).

If all “pen” input is virtual, then multiple physical instances of thesame logical Netpage page instance can be printed and used as a basisfor remote collaboration or conferencing. Any digital ink 9 drawnvirtually by one authorized user instantaneously appears “on” the otherinstances of the page 850 when viewed by other authorized users.

Even on different logical instances of a page a subregion can be mappedto a shared “whiteboard” for remote collaboration and conferencingpurposes.

Physical and virtual digital ink can also co-exist on the same physicalpage.

Whether Netpage pen input actually marks the page or is only displayedvirtually, and whether pen input is created relative to page contentprinted physically or displayed virtually, the pen input is captured bythe Netpage system as digital ink and is interpreted in the context ofthe corresponding page description. This can include interpreting it asan annotation, as streaming input to an application, as form input to anapplication (e.g. handwriting, a drawing, a signature, or a checkmark),or as control input to an application (e.g. a form submission, ahyperlink activation, or a button press) [3].

FIG. 34 shows another version of the page 850 of FIG. 29, where even thestatic page content 858 and 852 is virtual and is only seen via theNetpage HMD (or the Netpage browser). In this case the entire page canbe thought of as a dedicated SVD for the static and dynamic content ofthe page. Only the tag pattern(s) 854, 856 exist on the physical page,and the virtual content is associated with the page, possibly by“printing” onto the page by passing it through a virtual “printer”device. The virtual Netpage printer simply determines the page ID ofeach page which passes through it and associates it with the nextdocument page. The association between page ID and page content is stillrecorded by the Netpage server in the usual way.

Physical pages can be manufactured from durable plastic and can betagged during manufacture rather than being tagged on demand. They canbe re-used repeatedly. New content can be “printed” onto a page bypassing it through a virtual Netpage printer. Content can be wiped froma page by passing it through a virtual Netpage shredder. Content canalso be erased using various forms of Netpage erasers. For example, aNetpage stylus or pen operating in one eraser mode may only be capableof erasing digital ink, while operating in another eraser mode may alsobe capable of erasing page content.

Fully virtualising page content has the added advantage that pages canbe viewed and read in ambient darkness.

Although not shown in the figures, regions which are augmented withvirtual content (such as video clips and the like) are ideally printedin black. Since the output of the Netpage HMD is added to the page, itis ideally added to black to create color and white. It cannot be usedto subtract color from white to create black. In regions where black isimpractical, such as when annotating physical page content with virtualdigital ink, the brightness of the HMD output is sufficiently high to beclearly visible even with a white page in the background.

If plastic blanks are used and all page content is virtual, then theblanks are also ideally black, and matte to prevent specular reflectionof ambient light.

FIG. 35 shows a mobile phone device 872 incorporating an SVD. Like thedocument page discussed above, the display surface 874 includes a tagpattern scaled for longer-range imaging by a Netpage HMD 856. It alsooptionally includes a tag pattern 854 scaled for close-range imaging bya Netpage stylus or pen 8, for “touch-screen” operation.

The extent of the SVD 876 need not be constrained by the physical sizeof the device to which it is “attached”. As shown in FIG. 36, thedisplay 876 can protrude laterally beyond the bounds of the device 872.

The SVD 876 can also be used to virtualise the input functions on thedevice 872, such as the keypad in this case, as shown in FIG. 37.

Generally also, the SVD 876 can overlay the conventional display 874 ofthe device 872, such as an LCD or OLED. The user may then choose to usethe built-in display 874 or the SVD 876 according to circumstance.

Although the examples show a mobile phone device 872, the same approachapplies to any portable device incorporating a display and/or a controlinterface, including a personal digital assistant (PDA), an musicplayer, A/V remote control, calculator, still or video camera, and soon.

Since, as discussed earlier, the physical surface 874 of an SVD 876 isideally matte black, it provides an ideal place to incorporate a solarcell into the device 872 for generating power from ambient light.

FIG. 38 shows an SVD 876 used as a cinema screen 878. Note that thescale of the HMD-oriented tag pattern 856 is much larger than in thecases described above, because on the much larger average viewingdistance.

The movie is virtually projected from a video source 880, either viadirect streaming from a video transmitter 882 to the Netpage HMDs of themembers of the audience 884, or via a Netpage server 812 and anarbitrary communications network 814.

Individual delivery of content to each audience member during anotherwise “shared” viewing experience has the advantage that it canallow individual customisation. For example, specific edits can bedelivered according to age, culture or other preference; each individualcan specify language, subtitle display, audio settings such as volume,picture settings such as brightness, contrast, color and format; andeach individual may be provided with personal playback controls such aspause, rewind/replay, skip etc.

In a public performance scenario, a Netpage-encoded printed ticket canact as a token which gives a HMD access to the move. The ticket can bepresented in the field of view of the tag sensor in the HMD, and the HMDcan present the scanned ticket information to the projection system togain access.

FIG. 39 shows an SVD used as a video monitor 886, e.g. to displaypre-recorded or live video from any number of sources including atelevision (TV) receiver 888, video cassette recorder (VCR) 890, digitalversatile disc (DVD) player 892, personal video recorder (PVR) 894,cable video receiver/decoder 896, satellite video receiver/decoder 898,Internet/Web interface 900, or personal computer 902. Again note thatthe scale of the HMD-oriented tag pattern 856 is larger than in the pageand personal device cases described above, but smaller than in thecinema case.

The video switch 906 directs the video signal from one of the videosources (888-902), to the Netpage HMDs 300 of one or more users. Thevideo is delivered via direct streaming from a video transmitter 882 ora Netpage server 812 and an arbitrary communications network 814.

As in the case of cinema described above, video delivered via an SVD hasthe advantage can be individually customised.

FIG. 40 shows an SVD used as a computer monitor 914. The monitor surfaceincludes a tag pattern scaled for imaging by a Netpage HMD 856. It alsooptionally includes a tag pattern scaled for close-range imaging 854 bya Netpage stylus or pen 8, for “touch-screen” operation. Video outputfrom the personal computer 902 or workstation is delivered either viadirect streaming from a video transmitter 882 to the Netpage HMDs 300 ofone or more users, or via a Netpage server 812 and an arbitrarycommunications network 814.

Another input device 908 is also optionally provided, tagged with astylus-oriented tag pattern 854. The input device can be used to providea tablet and/or a virtualised keyboard 910, as well as other functions.Input from the stylus or pen 8 is transmitted to a Netpage server 912 inthe usual way, for interpretation and possible forwarding. Althoughshown separately, the Netpage server 812 may be executing on thepersonal computer 902.

Multiple monitors 908 may be used in combination, in variousconfigurations.

Advertising in public spaces, if virtually displayed, can be targetedaccording to the demographic of each individual viewer. People may berewarded for opting in and providing a demographic profile. Virtuallydisplayed advertising can be more finely segmented, both time-wise,according to how much an advertiser is willing to pay, and according todemographic. Targeting can also occur according to time-of-day,day-of-week, season, weather, external event etc.

If the advertising appears in (or is attached to) a movable object suchas a magazine, newspaper, train, bus or taxi poster, or productpackaging, then the advertising content can also be targeted accordingthe instantaneous location of the viewer, as indicated by a locationdevice associated with the user, such as a GPS receiver.

If the HMD incorporates gaze tracking, then gaze direction informationcan be used to provide statistical information to advertisers on whichelements of their advertising is catching the gaze of viewers, i.e. tosupport so-called “copy testing”. More directly, gaze direction can beused to animate an advertising element when the user's gaze strikes it.

The Netpage HMD can be used to search a physical space, such as acluttered desktop, for a particular document. The user first identifiesthe desired document to the Netpage system, perhaps by browsing avirtual filing cabinet containing all of the user's documents. The HMDis then primed to highlight the document if it is detected in the user'sfield of view. The Netpage system informs the HMD of the relationbetween the tags of the desired document and the physical extent of thedocument, so that the HMD can highlight the outline of the document whendetected.

The user's virtual filing cabinet can be extended to contain, eitheractually or by reference, every document or page the user has ever seen,as detected by the Netpage HMD. More specifically, in conjunction withgaze tracking, the system can mark the regions the user has actuallylooked at. Furthermore, by detecting the distinctive saccades associatedwith reading, the system can mark, with reasonable certainty, textpassages actually read by the user. This can subsequently be used tonarrow the context of a content search.

One of the advantages of the Netpage HMD is that it allows the user toconsume and interact with information privately, even when in a publicplace. However, because each pixel is projected in succession, a snoopercan build a simple detection device to collect each pixel in turn fromany stray light emitted by the HMD, and re-synchronise it after the factto regenerate a sequence of images. To combat this, the HMD can emitrandom stray light at the pixel rate, to swamp any meaningful straylight from the display itself.

A non-planar three-dimensional object, if unadorned but tagged on someor all of its faces, may act as a proxy for a corresponding adornedobject. For example, a prototyping machine may be used to fabricate ascale model of a concept car. Disposing tags on the surface of theprototype then allows color, texture and fine geometric detail to bevirtually projected onto the surface of the car when viewed through aNetpage HMD.

More simply, a pre-manufactured and pre-tagged shape such as a sphere,ellipsoid, cube or parallelopiped of a certain size can be used as aproxy for a more complicated shape. Virtual projection onto its surfacecan be used to imbue it with apparent geometry, as well as with color,texture and fine geometric detail.

REFERENCES

The following references are incorporated herein by cross-reference.

-   Lapstun, P. and K. Silverbrook, “Method and System for Printing a    Document”, U.S. Pat. No. 6,728,000, issued 27 Apr. 2004-   [2] Silverbrook, K. and P. Lapstun, “Digital Image Warping System”,    U.S. Pat. No. 6,636,216, issued 21 Oct. 2003-   [3] see Appendix A-   Silverbrook Research, “Sensing device for coded data”, U.S. patent    application Ser. No. 10/815,636 (Docket Number HYJ001), filed 2 Apr.    2004, claiming priority from [9,11,12]-   [5] Silverbrook Research, “Laser scanner device for printed product    identification codes”, U.S. patent application Ser. No. 10/815,609    (Docket Number HYT001), filed 2 Apr. 2004, claiming priority from    [11,12]-   [6] Silverbrook Research, “Rotationally symmetric tags”, U.S. patent    application Ser. No. 10/309,358, filed 4 Dec. 2002-   Silverbrook Research, “Method and system for telephone control”,    U.S. patent application Ser. No. 09/721,895, filed 25 Nov. 2000-   [8] Silverbrook Research, “Viewer with code sensor”, U.S. patent    application Ser. No. 09/722,175, filed 25 Nov. 2000-   [9] Silverbrook Research, “Image sensor with digital framestore”,    U.S. patent application Ser. No. 10/778,056 (Docket Number NPS047),    filed 17 Feb. 2004, claiming priority from [10]-   [10] Silverbrook Research, “Methods, systems and apparatus”,    Australian Provisional Patent Application 2003900746 (Docket Number    NPS041), filed 17 Feb. 2003-   [11] Silverbrook Research, “Methods and systems for object    identification and interaction”, Australian Provisional Patent    Application 2003901617 (Docket Number NIR002), filed 7 Apr. 2003-   [12] Silverbrook Research, “Methods and systems for object    identification and interaction”, Australian Provisional Patent    Application 2003901795 (Docket Number NIR005), filed 15 Apr. 2003-   [13] Akenine-Mŝller, T, and E. Haines, Real-Time Rendering, Second    Edition, A K Peters 2002-   [14] Amir, A., M. D. Flickner, D. B. Koons and C. H. Morimoto,    “System and Method for Eye Gaze Tracking Using Corneal Image    Mapping”, U.S. Pat. No. 6,659,611, issued 9 Dec. 2003-   [15] Behringer, R., G. Klinker, and D. W. Mizell, eds., Augmented    Reality: Placing Artificial Objects in Real Scenes: Proceedings of    IWAR '98, A K Peters 1999-   [16] Berge, B., and J. Peseux, “Lens with variable focus”, U.S. Pat.    No. 6,369,954, issued 9 Apr. 2002-   [17] Bloebaum, F., “Method and Apparatus for Determining the Light    Transit Time Over a Measurement Path Arranged Between a Measuring    Apparatus and a Reflecting Object”, U.S. Pat. No. 5,805,468, issued    9 Sep. 1998-   [18] Blum, R. D., D. P. Dustin, and D. Katzman, “Method for    refracting and dispensing electro-active spectacles”, U.S. Pat. No.    6,733,130, issued 11 May 2004-   [19] Cameron, C. D., D. A. Pain, M. Stanley, and C. W. Slinger,    “Computational challenges of emerging novel true 3D holographic    displays”, Critical Technologies for the Future of Computing,    Proceedings of SPIE Vol. 4109, 2000, pp. 129-140-   [20] Cleveland, D., J. H. Cleveland and P. L. Norloff, “Eye Tracking    Method and Apparatus”, U.S. Pat. No. 5,231,674, issued 27 Jul. 1993-   [21] Demos, G. E., “System and Method for Motion Compensation and    Frame Rate Conversion”, U.S. Pat. No. 6,442,203, issued 27 Aug. 2002-   [22] Dignam, D. L., “Circuit and method for trilinear filtering    using texels from only one level of detail”, U.S. Pat. No.    6,452,603, issued 17 Sep. 2002-   [23] Duchowski, A. T., Eye Tracking Methodology, Theory and    Practice, Springer-Verlag 2003-   [24] Favalora, G. E., J. Napoli, D. M. Hall, R. K. Dorval, M. G.    Giovinco, M. J. Richmond, and W. S. Chun, “100 Million-voxel    volumetric display”, Cockpit Displays IX: Displays for Defense    Applications, Proceedings of SPIE Vol. 4712, 2002, pp. 300-312-   [25] Feenstra, B. J., S. Kuiper, S. Stallinga, B. H. W. Hendriks,    and R. M. Snoeren, “Variable focus lens”, PCT Patent Application WO    03/069380, filed 24 Jan. 2003-   [26] Fulton, J. T., Processes in Biological Vision,    http://www.4colorvision.com-   [27] Furness III, T. A., and J. S. Kollin, “Retinal Display Scanning    of Image with Plurality of Image Sectors”, U.S. Pat. No. 6,639,570,    issued 28 Oct. 2003-   [28] Furness III, T. A., and J. S. Kollin, “Virtual Retinal    Display”, U.S. Pat. No. 5,467,104, issued 14 Nov. 1995-   [29] Gerhard, G. J., C. T. Tegreene, and B. Z. Eslam, “Scanned    Display with Pinch, Timing, and Distortion Correction”, 5 Aug. 1998-   [30] Gortler, S. J., R. Grzeszczuk, R. Szeliski, and M. F. Cohen,    “The Lumigraph”, ACM Computer Graphics Proceedings, Annual    Conference Series, 1996, pp. 43-54-   [31] Heckbert, P. S., “Survey of Texture Mapping”, IEEE Computer    Graphics & Applications 6(11), pp. 56-67, November 1986-   [32] Hornbeck, L. J., “Active yoke hidden hinge digital micromirror    device”, U.S. Pat. No. 5,535,047, issued 9 Jul. 1996-   [33] Humphreys, G. W., and V. Bruce, Visual Cognition, Lawrence    Erlbaum Associates, 1989, p. 15-   [34] Hutchinson, T. E., C. Lankford and P. Shannon, “Eye Gaze    Direction Tracker”, U.S. Pat. No. 6,152,563, issued 28 Nov. 2000-   [35] Isaksen, A., L. McMillan, and S. J. Gortler, “Dynamically    Reparameterized Light Fields”, ACM Computer Graphics Proceedings,    Annual Conference Series, 2000, pp. 297-306-   [36] Levoy, M. and P. Hanrahan, “Light Field Rendering”, ACM    Computer Graphics Proceedings, Annual Conference Series, 1996, pp.    31-42-   [37] Lewis, J. R., H. Urey and B. G. Murray, “Scanned Imaging    Apparatus with Switched Feeds”, U.S. Pat. No. 6,714,331, issued 30    Mar. 2004-   [38] Lewis, J. R., and N. Nestorovic, “Personal Display with Vision    Tracking”, U.S. Pat. No. 6,396,461, issued 28 May 2002-   [39] Maturi, G. V., V. Bhargava, S. L. Chen, and R.-Y. Wang, “Hybrid    Hierarchial/Full-search MPEG Encoder Motion Estimation”, U.S. Pat.    No. 5,731,850, issued 24 Mar. 1998-   [40] Matusik, W., and H. Pfister, “3D TV: A Scalable System for    Real-Time Acquisition, Transmission, and Autostereoscopic Display of    Dynamic Scenes”, ACM Computer Graphics Proceedings, Annual    Conference Series, 2004-   [41] McGrath, D. S., “Methods and Apparatus for Processing    Spatialised Audio”, U.S. Pat. No. 6,021,206, issued 1 Feb. 2000-   [42] McMillan, L. and G. Bishop, “Plenoptic Modeling: An Image-Based    Rendering System”, ACM SIGGRAPH 95, pp. 39-46-   [43] McQuaide, S. C., E. J. Seibel, R. Burstein and T. A. Furness    III, “50.4: Three-dimensional virtual retinal display system using a    deformable membrane mirror”, SID 02 DIGEST-   [44] Meisner, J., W. P. Donnelly, and R. Roosen, “Augmented Reality    Technology”, U.S. Pat. No. 6,625,299, issued 23 Sep. 2003-   [45] Melzer, J. E., and K. Moffitt, Head Mounted Displays: Designing    for the User, McGraw-Hill 1997-   [46] Miller, G., “Volumetric Hyper-Reality, A Computer Graphics Holy    Grail for the 21st Century?”, Graphics Interface '95, pp. 56-64-   [47] Naumov, A. F., and M. Yu. Loktev, “Liquid-crystal adaptive    lenses with modal control”, OPTICS LETTERS, Vol. 23, No. 13, Jul. 1,    1998, pp. 992-994-   [48] Nayar, S. K., V. Branzoi, and T. E. Boult, “Programmable    Imaging using a Digital Micromirror Array”, Proceedings of the IEEE    Computer Society Conference on Computer Vision and Pattern    Recognition, July 2004, pp. 436-443-   [49] Nishino, K., and S. K. Nayar, “The World in an Eye”,    Proceedings of IEEE Conference on Computer Vision and Pattern    Recognition, Washington DC, June 2004-   [50] Perlin, K., S. Paxia, and J. S. Kollin, “An Autostereoscopic    Display”, ACM Computer Graphics Proceedings, Annual Conference    Series, 2000, pp.319-326-   [51] Silverman, N. L., B. T. Schowengerdt, J. P. Kelly, and E. J.    Seibel, “58.5L: Late-News Paper:

Engineering a Retinal Scanning Laser Display with IntegratedAccommodative Depth Cues”, SID 03 DIGEST, pp.1538-1541

-   [52] St.-Hilaire, P., M. Lucente, J. D. Sutter, R. Pappu, C. D.    Sparrell, and S. A. Benton, “Scaling up the MIT holographic video    system”, Fifth International Symposium on Display Holography,    Proceedings of SPIE Vol. 2333, 1992, pp. 374-380-   [53] Sverdrup, L. H. Jr., N. F. Dessel, and A. Pelkus, “Thin film    flexible solar cell”, U.S. Pat. No. 6,548,751, issued 15 Apr. 2003-   [54] Urey, H., D. W. Wine, and T. D. Osborn, “Optical performance    requirements for MEMS-scanner based microdisplays”, Conference on    MOEMS and Miniaturized Systems, SPIE Vol. 4178, pp. 176-185, Santa    Clara, Calif. (2000)-   [55] Urey, H., “Apparatus and Methods for Generating Multiple    Exit-Pupil Images in an Expanded Exit Pupil”, US Patent Application    2003/0086173, published 8 May 2003-   [56] Williams, D. R., and J. Liang, “Method and apparatus for    improving vision and the resolution of retinal images”, U.S. Pat.    No. 5,949,521, issued 7 Sep. 1999-   [57] Williams, L., “Pyramidal Parametrics”, Computer Graphics (Proc.    SIGGRAPH 1983) 17(3), July 1983, pp. 1-11-   [58] Wolberg, G., Digital Image Warping, IEEE Computer Society    Press, 1988-   [59] Wolf, P. R., and B. A. Dewitt, Elements of Photogrammetry, 3rd    Edition, McGraw-Hill 2000-   [60] Wolpaw, J. R., and D. J. McFarland, “Communication method and    system using brain waves for multidimensional control”, U.S. Pat.    No. 5,638,826, issued 17 Jun. 1997

1. An augmented reality device for inserting virtual imagery into auser's view of their physical environment, the device comprising: asee-through display device through which the user can view the physicalenvironment, said display device including a wavefront modulator; acamera for imaging at least one surface in the physical environment; anda controller configured for: capturing, using the camera, at least oneimage of the surface; determining, at least partially from the at leastone captured image, the virtual imagery to be displayed at apredetermined position relative to the surface; determining, at leastpartially from the at least one captured image, a position of thesurface relative to the augmented reality device; generating at leastone image based on the virtual imagery and on the position of thesurface relative to the augmented reality device, the image includingpixel depth information; and displaying the generated image via thedisplay device, including modulating, based on the pixel depthinformation, the wavefront curvature of the light emitted for eachpixel, so that the user sees the virtual imagery at the predeterminedposition relative to the surface regardless of changes in position ofthe user's eyes with respect to the display device.
 2. An augmentedreality device according to claim 1 wherein the display device has twosee-through displays, one for each of the user's eyes respectively. 3.An augmented reality device according to claim 1 wherein the displaydevice, the camera and the controller are adapted to be worn on theuser's head.
 4. An augmented reality device according to claim 1 whereinthe display device has a virtual retinal display (VRD) for each of theuser's eyes, each of the VRD's scanning at least one beam of light intoa raster pattern and modulating the or each beam to produce spatialvariations in the virtual imagery.
 5. An augmented reality deviceaccording to claim 4 wherein the VRD scans red, green and blue beams oflight to produce color pixels in the raster pattern.
 6. An augmentedreality device according to claim 5 wherein the VRDs present differentimages to each of the user's eyes, the differences being based on eyeseparation and the distance to the predetermined position of the virtualimagery so as to create a perception of depth via stereopsis.
 7. Anaugmented reality device according to claim 1 wherein the wavefrontmodulator uses a deformable membrane mirror, liquid crystal phasecorrector, a variable focus liquid lens or a variable focus liquidmirror.
 8. An augmented reality device according to claim 1 wherein thevirtual imagery is a movie, a computer application interface, computerapplication output, hand drawn strokes, text, images or graphics.
 9. Anaugmented reality device according to claim 1 wherein the display devicehas pupil trackers to detect an approximate point of fixation of theuser's gaze such that a virtual cursor can be projected into the virtualimagery and navigated using gaze direction.
 10. An augmented realitydevice according to claim 1, further comprising an optical range finderfor determining range information using time-of-flight measurement,wherein the controller is configured for determining the position of thesurface relative to the augmented reality device using the rangeinformation in combination with the at least one captured image.
 11. Anaugmented reality device according to claim 1 wherein the surface has apattern of coded data disposed thereon, and wherein the controller isconfigured to at least partially identify the virtual imagery to bedisplayed using at least part of the pattern of coded data contained inthe at least one captured image.
 12. An augmented reality deviceaccording to claim 11, wherein the pattern of coded data is indicativeof an identity of the surface, and wherein the controller is configuredto at least partially identify the virtual imagery to be displayed basedon the identity of the surface.
 13. An augmented reality deviceaccording to claim 1, wherein the surface has a pattern of coded datadisposed thereon, and wherein the controller is configured to at leastpartially determine the position of the surface relative to theaugmented reality device using at least part of the pattern of codeddata contained in the at least one captured image.
 14. An augmentedreality device according to claim 13, wherein the pattern of coded datadisposed on the surface includes a grid of target elements.
 15. Anaugmented reality device according to claim 13, wherein the pattern ofcoded data disposed on the surface is indicative of a plurality ofcoordinate locations on the surface, and wherein the controller isconfigured to at least partially determine the position of the surfacerelative to the augmented reality device using at least one of thecoordinate locations.